Postgresql – Postgres primary key sorted in the reverse order. Will it use the index efficiently

indexpostgresqlprimary-keysorting

My Rails database is backed by Postgres database. As you may know, each table in Rails gets assigned a primary key which is of Integer type and is indexed.

My list views show records in the reverse chronological order. So I just sort my result set in the reverse order of the primary key.

My question is: will the query use the primary key index? And if yes then as efficiently? How do I verify that?

Thanks in advance for your time.

Bharat

Best Answer

Indexes are typically unidirectional. This allows improvements in space and index scanning speed as back links don't need to be maintained. As @a_horse_with_no_name has noted, Postgres can and will be used to search in the reverse order to the index. (If this was not a unique index with no NULLs, positioning of NULLS FIRST/LAST could be a consideration.) Performance may not be as fast as using a non-standard index ordering, and the documentation notes significant speedups can be obtained with non-standard orderings.

For systems that don't support scanning indexes backwards, if your index is ascending, it would not be used for your query. There are two options in this case:

add another index with DESC specified; or
replace the primary key index with one which is has DESC specified. (I haven't tried this, and likely wouldn't as it breaks my expectations for primary keys.)

Related Solutions

Sql-server – Storage order vs Result order

Let me try to explain why you should not do that, why you should never assume that an SQL-product will return a result set in a specific order, unless you specify so, whatever indices - clustered or non-clustered, B-trees or R-Trees or k-d-trees or fractal-trees or whatever other exotic indices a DBMS is using.

Your original query tells to the DBMS to search the SensorValues table, find rows that match the 3 conditions, order those rows by Date descending, keep only the first row from those and - finally - select and return only the SensorValue column.

SELECT TOP 1 SensorValue
  FROM SensorValues
  WHERE SensorId = 53
    AND DeviceId = 3819
    AND Date < 1339225010
  ORDER BY Date DESC ;

These are very specific orders you have given to the DBMS and the result will most probably be the same every time you run the query (there is a chance it might not, if you have more than one row that match the conditions and have the same max Date but different SensorValue but lets assume for the rest of the conversation that no such rows exist in your table).

Does the DBMS have to do this, to run this query, the exact way I describe it above? No, of course not and you know that. It may not read the table but read from an index. Or it may use two indexes if it thinks it's better (faster). Or three. Or it may use a cached result (not SQL Server but other DBMS cache query results). Or it may use parallel execution one time and not the next time it runs. Or ... (add any other feature that affects execution and execution plans).

What is guaranteed though is that it will return the exact same result, every time you run it - as long as no rows are inserted, deleted or updated.

Now lets see what your suggestion says:

SELECT TOP 1 SensorValue
  FROM SensorValues
  WHERE SensorId = 53
    AND DeviceId = 3819
    AND Date < 1339225010 ;

This query tells to the DBMS to search the SensorValues table, find rows that match the 3 conditions, ~~order those rows by Date descending,~~, don't care about the order, keep only one row and - finally - select and return only the SensorValue column.

So, it basically tells the same as the first one, except that it tells that you want one result only that matches the conditions and you don't care which one.

Now, can we assume that it will give always the same result because of the clustered index?
- If it does use this clustered index every time, yes.

But will it use it?
- No.

Why not?
- Beacuse it can. The query optimizer is free to choose a path of execution every time it runs a statement. Whatever path it sees fit at that time for that statement.

But isn't using the clustered index the best/fastest way to get results?
- No, not always. It might be the first time you run the query. The second time, it may use a cached result (if the DBMS has such a feature, not SQL Server^*). The 1000th time the result may have been removed from the cache and another result may exist there. Say, you had executed this query just before:

SELECT TOP 1 SensorValue
  FROM SensorValues
  WHERE SensorId = 53
    AND DeviceId = 3819
    AND Date < 1339225010
  ORDER BY Date ASC ;         --- Notice the `ASC` here

and the cached result (from the above query) is another, different one that still matches your conditions but is not the first in your (wanted) ordering. And you have told the DBMS not to care about the order.

OK, so only cache can affect this?
- No, many other things, too.

other indexes were considered, at that time by the DBMS as better for this query.
a developer changed or completely removed this clustered index you had.
you or some other developer added another index that the optimizer decided it's more efficient to use than the CI.
you updated to a new version and the new optimizer has a minor bug or a change in how it ranks and chooses execution plans.
statistics were updated.
parallel execution was chosen instead.

^{*: SQL Server does not cache query results but the Enterprise Edition does have an Advanced Scanning feature which is kind of similar in that you may get different results because of concurrent queries. Not sure exactly when this kicks in though. (thnx @Martin Smith for the tip.)}

I hope you are convinced that you should never rely that an SQL query will return results in a specific order, unless you specify so. And never use TOP (n) without ORDER BY, unless of course you just want n rows in the result and you don't care which ones are returned.

Sql-server – SQL Server 2008 R2 Missing Index

The primary key in SQL Server is by default also the clustering key - and the data is physically ordered by the clustering key in a table.

So if this primary key is listed, maybe it's because it is a non-clustered primary key - something that you have to do explicitly - but it's entirely possible.

The primary key is a logical construct - it's job is to make sure each row in a table can be uniquely and reliably identified. It has no impact whatsoever on the physical structuring of the table.

The clustering key (and thus the clustered index) is what determines the physical ordering of the data in a SQL Server table. The table's primary key is the clustering key by default - but it doesn't have to be - you can have two totally separate sets of columns for the primary key and the clustering key.

But all these things aside: the DMV's that identify missing indexes aren't perfect - do take their recommendations with a grain of salt! Don't just blindly implement everything that DMV recommends - sometimes it recommends stuff that's already in place, or some of its recommendations are even contra-productive! Don't blindly trust it - it's a helping tool - but it's not perfect!

Best Answer

Related Solutions

Sql-server – Storage order vs Result order

Sql-server – SQL Server 2008 R2 Missing Index

Related Question