Postgresql – Indexes and windowing functions

indexindex-tuningpostgresql

We have a design in which a table stores the history of changes to an entity, and a view queries the table for the latest row for that entity.

Say, employee_history and employee (the view with latest status).

The view is defined using a rank() function:

create view employee as (
  select * from (
    select
      *
      , rank() over (partition by employee_id order by last_update_ts desc)
    from employee_history
  ) e
  where r = 1
)

There is an index over the employee_id and the update timestamp, both fields used in the view definition.
Most of the queries over this view are for a specific employee, with its ID. So the query engine makes good use of the index.

But now we need a query to find employees in some department, using the department name. This view definition cannot make use of the index.

How may I get a more performant query over the history table? I can create new indices and new views. I cannot create a new table containing only the current data; it has to be over the history table.

This is on Postgres 11.

Best Answer

Your query for a specific department can make use of the index:

create index on employee_history (employee_id , last_update_ts desc);

The index without the desc indeed cannot be used. I don't think there is a good reason for this, it is just some kind of oversight in the planner. However, this index usage of the index that can be used is not going to be particularly fast anyway. It will have to read the entire index and table. It will just avoid the sort of the entire table, not the reading of it.

This is a fundamental limitation of your questionable design. Without some extra limitations (like employees can never move between departments) I don't see anything you can do about this in general.

Related Solutions

Sql-server – Similar indices in MS SQL Server

The indexes are similar but not identical. The order of the columns in an index matters - a lot - and affects what queries might benefit from them.

We don't know your queries but for example a query like the following would benefit most from index A:

SELECT TOP (3) ChannelSKUID, Checked, SalesRank
FROM dbo.SalesRankCheck
WHERE ChannelSKUID = 'XXX'
ORDER BY Checked ;

while this one would benefit from index B:

SELECT TOP (3) ChannelSKUID, Checked, SalesRank
FROM dbo.SalesRankCheck
WHERE SalesRank = 1 
ORDER BY Checked ;

If only one of the two indexes were available, the query plans might still use it but the plan would not be optimal, due to the difference in column order in the index and the mismatch with the optimal column order needed for the query. The index would still be "covering" the query but the plan would probably be a full index scan instead of an index seek and would have additional (sort) operations as the rows provided by the index would not be in the order the query wants.

For the question now:

Am I right in thinking that this isn't a sensible way of having it organised and it would make more sense to have a single index with all 3 columns indexed or does that depend on the way the database is being used?

No, it is a sensible way and I wouldn't expect a single index to cover queries that might choose either index otherwise.

But it depends on what queries run against the table. Any analysis should be based on the workload you have and the performance you expect. Analyze the queries (all of them or the ones most often run) and what indexes they use and what performance you get, before you decide dropping or adding an index.

Run tests in test installations with different indexing, if performance is not good. Or experiment (say with dropping an index), if you think that the index is not used.

For a quick and dirty advice, if the number of indexes on the table is small and the performance is good, don't bother changing anything. The extra load that indexes impose on write operations is usually small, unless you have a table with 10+ indexes or a table that is under extreme write load (thousands of writes per second).

Mysql – Index for date column along with multiple other columns in where clause

This is a "range: joining_date >= '2017-01-01' AND joining_date <= '2017-05-31'

This is sort of a range: department_no IN (5,4,7) . There are cases where it acts efficiently like =, there are cases where it should be considered a "range".

This is an =: department_no IN (5) . That is, the Optimizer turns it into department_no = 5, which has much more optimization potential.

http://mysql.rjweb.org/doc.php/index_cookbook_mysql says to put range columns last, not first in an index.

In your queries, it is hard to predict what is optimal.

If you are likely to sometimes have single-item INs, then you need

INDEX(department_no, joining_date) -- in THIS order

Also, newer versions of MySQL may work well with that index for the multi-item IN by leapfrogging across the table. So add that index.

As for

INDEX(joining_date, ...)

When using a 'range', that index will not get past joining_date to whatever other column(s) follow it. So you may as say just

INDEX(joining_date)

As for the GROUP BY... Neither of your queries has any hope of using an index for the GROUP BY. So there will be some form of extra effort after the WHERE -- a tmp table, a sort, something.

Hiding a column in a function month(joining_date) -- the Optimizer can't use the joining_date column an any index for this.
The WHERE left the results in some messed up order (assuming a multi-item IN)

Unrelated... I like this pattern:

AND joining_date >= '2017-01-01' 
AND joining_date  < '2017-01-01' + INTERVAL 1 MONTH

It avoids computing the end, leap year, DATE vs DATETIME vs DATETIME(6) issues, etc.

Bottom line: Have this one index:

INDEX(branch_name, department_no, joining_date)

Then get the EXPLAIN SELECT ... to see if it did the leapfrogging. If that fails, let's rethink.

Best Answer

Related Solutions

Sql-server – Similar indices in MS SQL Server

Mysql – Index for date column along with multiple other columns in where clause

Related Question