Postgresql – 30 days rolling sum as base for fee calculation. Faster solution

performancepostgresqlquery-performance

I need to calculate PayPal fees or an online shop to show an overview table with the fees per day. I came up with a solution playing this over several views, but it's dead slow. I takes more than a seven minutes for around 4200 original rows of order data with Paypal fees to calculate.

Hence, I am looking for a faster solution, but this is over my head.

Paypal fees have a fix base price per transaction and a percentage on top, which is tiered. The tier is selected by the sum of turnaround payed with PayPal in the last month (I assume 30 days).

This is the tiered fee information:

  • tier 0 0 – 2.500 €: 3,4% + 0,35 €
  • tier 1 2.501 – 10.000 €: 2,9% + 0,35 €
  • tier 2 10.001 – 50.000 €: 2,7% + 0,35 €
  • tier 3 50.001 – 100.000 €: 2,4% + 0,35 €
  • tier 4 100.001 – unlimited €: 1,9% + 0,35 €

What I came up so far are a couple of views based on order data imported from the shop:

-- values are all multiplied by 100 to have int values

-- calculate the fees on every order. all possible tiers.
CREATE VIEW public.shopify_created_paypal_tiered_fees AS
SELECT order_id,
             created_at::DATE as date
           total_price, 
             round(0.35 + customer_paid * 0.034, 0) as tier_0_fee,
             round(0.35 + customer_paid * 0.029, 0) as tier_1_fee,
             round(0.35 + customer_paid * 0.027, 0) as tier_2_fee,
             round(0.35 + customer_paid * 0.024, 0) as tier_3_fee,
             round(0.35 + customer_paid * 0.019, 0) as tier_4_fee
  FROM public.shopify_order_financial_data
 WHERE payment_gateway_names @> ARRAY['paypal']::text[];

-- group and sum them up by day
CREATE VIEW public.shopify_created_paypal_tiered_fees_by_day AS
SELECT created_at_date, 
       sum(total_price) as total_price_sum, 
       sum(tier_0_fee) as tier_0_fee_sum, 
       sum(tier_1_fee) as tier_1_fee_sum, 
       sum(tier_2_fee) as tier_2_fee_sum, 
       sum(tier_3_fee) as tier_3_fee_sum, 
       sum(tier_4_fee) as tier_4_fee_sum
  FROM public.shopify_created_paypal_tiered_fees
 GROUP BY created_at_date
 ORDER BY created_at_date desc;

-- calculate the rolling sum of the last 30 days per day.
CREATE VIEW public.shopify_created_pp_turnaround_of_last_30_days_by_day AS
SELECT created_at_date, 
       total_price_sum, 
       (SELECT sum(total_price_sum)
          FROM public.shopify_created_paypal_tiered_fees_by_day
         WHERE created_at_date BETWEEN SYMMETRIC oft.created_at_date::DATE 
                                   AND (oft.created_at_date::DATE - INTERVAL '30 DAY')::DATE) as pp_turnaround_of_last_30_days
  FROM public.shopify_created_paypal_tiered_fees_by_day as oft;

-- based on rolling sum 30 days before order's created_at_date use the correct fee and add info or which one was used.
CREATE VIEW public.shopify_created_paypal_active_tier_fees AS
SELECT pptfbd.created_at_date, 
       pptolm.total_price_sum,
       pp_turnaround_of_last_30_days,
       CASE 
         WHEN pp_turnaround_of_last_30_days BETWEEN 0 AND 250000 THEN tier_0_fee
         WHEN pp_turnaround_of_last_30_days BETWEEN 250000 AND 1000000 THEN tier_1_fee
         WHEN pp_turnaround_of_last_30_days BETWEEN 1000000 AND 5000000 THEN tier_2_fee
         WHEN pp_turnaround_of_last_30_days BETWEEN 5000000 AND 10000000 THEN tier_3_fee
         ELSE tier_4_fee
       END as pp_fee_for_day,
           CASE 
         WHEN pp_turnaround_of_last_30_days BETWEEN 0 AND 250000 THEN 'tier 0'
         WHEN pp_turnaround_of_last_30_days BETWEEN 250000 AND 1000000 THEN 'tier 1'
         WHEN pp_turnaround_of_last_30_days BETWEEN 1000000 AND 5000000 THEN 'tier 2'
         WHEN pp_turnaround_of_last_30_days BETWEEN 5000000 AND 10000000 THEN 'tier 3'
         ELSE 'tier 4'
       END as pp_tier_for_day,
           CASE 
         WHEN pp_turnaround_of_last_30_days BETWEEN 0 AND 250000 THEN '3,4% + 0,35 €'
         WHEN pp_turnaround_of_last_30_days BETWEEN 250000 AND 1000000 THEN '2,9% + 0,35 €'
         WHEN pp_turnaround_of_last_30_days BETWEEN 1000000 AND 5000000 THEN '2,7% + 0,35 €'
         WHEN pp_turnaround_of_last_30_days BETWEEN 5000000 AND 10000000 THEN '2,4% + 0,35 €'
         ELSE '1,9% + 0,35 €'
       END as pp_fee_formula_for_day,
       tier_0_fee, 
       tier_1_fee, 
       tier_2_fee, 
       tier_3_fee, 
       tier_4_fee
  FROM public.shopify_created_paypal_tiered_fees pptfbd
 INNER JOIN public.shopify_created_pp_turnaround_of_last_30_days_by_day pptolm ON pptfbd.created_at_date = pptolm.created_at_date;

Any suggestions for speeding this up? Thanks!

Best Answer

The issue you are running into is because you are creating nested views which can cause postgress to lose track of indexes and not use them properly. This is something that I have noticed before when you get multiple layers deep the optimizer will no longer be using the indexes. A good way to check this would be do an explain on each view.

What you can do is use a CTE(Common Table Expression) in order to do all the calculations in a single view. This will buy you two things.

  1. The optimizer will be able to hit all indexes
  2. You will be able to limit the data you are looking at to the last 30 days so the database will have less items to work at which will help performance.

https://www.postgresql.org/docs/current/queries-with.html