Postgresql – postgres query performance: view vs function

performancepostgresqlpostgresql-performance

I have a scenario where I need to run a payroll report. The report calculates the payroll amount, grouped by staff member, for a specific date range.

For example, when running the report for 2016-11-01 to 2016-11-30, I would see the following result:

Staff Id    Total
------------------
1           123.00
2           439.22

I'm using the following query for the above report:

select 
    user_id as staff_id,
    sum(amount) as total
from transaction
where
    business_id = <business_id> and
    type = 'staff' and
    kind = 'commission' and
    created_at between <start_date> and <end_date>
group by
    user_id;

I'm trying to determine the best way to optimize the performance of this query given the following requirements:

Results will vary based on the business_id, start_date and end_date
Data should always be fresh

It appears both views and functions would do the job, but I'm not 100% on which is the best approach given the requirements.

Sidenote: it would be great to cache the data based on the parameters mentioned above, but it seems like there isn't a great solution on the database side. Correct me if I'm wrong!

Additional information:

I'm running Postgres 9.6
I have indexes on the business_id, type, kind, user_id and created_at columns in the transaction table. These are all single column, btree indexes.

Best Answer

A view cannot help you producing an aggregate based on unknown parameters (business_id, start_date and end_date). It is nothing else than a given query, stored permanently in the database for later reuse. (Well, the implementation is more complicated, but that does not affect their usage.)

Considering your query, the furthest you can get with a view is

CREATE VIEW staff_commission AS
SELECT 
    user_id AS staff_id,
    business_id,
    amount,
    start_date,
    end_date
FROM transaction
WHERE
    type = 'staff' AND
    kind = 'commission';

Everything that is known beforehand is there, plus the columns you need for producing the desired output.

For obtaining the latter, you have to create a function in any case (optionally, working from the view). Building on the view makes sense when you have several queries that filter the date the same way.

The performance of all these will be the very same. If the indexes makes sense or not depends very much on the actual data. (I am pretty sure you don't need all of them, though.) Without knowing anything about those, it is very hard to guess which improvements are needed. Try what you have, check the EXPLAIN ANALYZE output and see if there is something missing.

Finally about caching: PostgreSQL is very smart about this. Going into details would fill a chapter or two in a book, but I would not worry about this until I see too many disk reads (which can be seen from EXPLAIN (ANALYZE, BUFFERS)).

Related Solutions

Mysql – Would adding indexes to the foreign keys improve performance on this MySQL query

Yes, performance may be better if you add those indexes. However, with such a small number of rows, it's quite possible that full table scan is more efficient and optimizer choses not to use any indexes.
After adding indexes your execution plan will be different, to get a rough estimation of how effective the indexes are you can multiply "Rows" column for each line of output of explain
In general, indexes on fields which participate in filtering/join conditions/order/group by improve performance. You also need to take into account selectivity (how many distinct values you have) of the column; if it's too low , the engine will not use it except if it's covering index for a query.
Foreign key is a constraint; the main purpose of any constraint to enforce some restriction (referential integrity in case of FK). Thus, if you care about integrity of you data, you should add foreign constraint.

The fact that Mysql implicitly creates an index on FK column means better read performance, and bit worse insert/update/delete performance (because index itself has to be updated).

Finally,

My thinking is that since I'm selecting from locations first, ....

is not absolutely correct. Physical processing is not the same as logical; optimizer decides in which order it will process tables involved (as you can see in your output, the engine first accesses tickets table) and what access method to use. You can control it to some extent with hints though...

*Side note. The way your WHERE clause written :

WHERE `tickets`.`client_id` = '20'
  AND
  (
    `customers`.`name` LIKE '%Mahoney%'
    OR `customers`.`email` LIKE '%Mahoney%'
    OR `locations`.`address` LIKE '%Mahoney%'
  )

makes your LEFT JOIN customers behave as INNER JOIN.*

Update Never mind my side note, I didn't pay attention you have ORs with multiple tables.

I hope it was helpful.

PostgreSQL – Calculating Column Running Total

Well, apparently this is it:

SELECT 
    *, 
    SUM(amount) OVER (
        ORDER BY tx_date DESC 
        ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    ) AS running_balance
FROM transactions
ORDER BY tx_date DESC;

Output:

 id |      memo       | amount |  tx_date   | running_balance 
----+-----------------+--------+------------+-----------------
  4 | Wages           |  75000 | 2015-08-31 |           77000
  2 | Eating out      |  -1000 | 2015-08-19 |            2000
  3 | Gas             |  -7000 | 2015-08-19 |            3000
  1 | Initial balance |  10000 | 2015-08-17 |           10000

Best Answer

Related Solutions

Mysql – Would adding indexes to the foreign keys improve performance on this MySQL query

PostgreSQL – Calculating Column Running Total

Related Question