Sql-server – How to replicate SQL Server index INCLUDE and STATISTICS functionality on PostgreSQL

indexpostgresqlsql serverstatistics

I'm working on a project that must support two database engines; SQL Server and PostgreSQL.

We are using NHibernate as the ORM.

We are running into performance issues with certain queries. Using SQL Server tools we've come up with several new indexes and statistics that greatly improve performance on SQL Server. However, I'm not certain how to implement the same indexes and statistics on PostgreSQL.

Two examples are:

    CREATE STATISTICS [perfStat_Answer_02] ON [dbo].[Answer] 
    ([InclusionExpressionGroupId], [QuestionId], [AnswerId])

    CREATE NONCLUSTERED INDEX [perf_Answer_01] ON [dbo].[Answer] 
    (
        [QuestionId] ASC
    )
    INCLUDE ( 
            [AnswerId],
            [InclusionExpressionGroupId],
            [AnswerConceptId],
            [Revision],
            [AnswerText],
            [AnswerOrder]
    )
    WITH (
            SORT_IN_TEMPDB = OFF
            , IGNORE_DUP_KEY = OFF 
            , DROP_EXISTING = OFF
            , ONLINE = OFF)
    ON [PRIMARY]

What is the syntax for the INCLUDEd fields in PostgreSQL, if such a feature exists?

How do we add statistics?

Reading the PostgreSQL docs, I'm not convinced that either are supported. However, I would like to know if there is any way to accomplish something similar.

Best Answer

I don't know what CREATE STATISTICS does, but statistics for the optimizer are collected using the ANALZYE command when autovacuum is running - which is turned on by default.

Statistics are always collected for all columns, no need to turn it on specifically.

You can control the level of details collected for the statistics on a per-column basis using ALTER TABLE ... ALTER COLUMN column SET STATISTICS integer.

An index in PostgreSQL is always non-clustered, so I'd assume that the above index maps to a regular index on the QuestionId column.

Not sure about the INCLUDE part. I assume this is to support index only retrievals if that index is chosen by the optimizer. As PostgreSQL does not yet have an index-only retrieval, there is no equivalent technique there.

Related Solutions

Sql-server – SQL Server 2008 query planner failing after index drop & recreate

Well, the issue is now resolved:

While it would seem logical to use filtered indexes (NOT NULL), to reduce database size and as so many sources on the web say, increase performance, the reality it seems is something else entirely.

In layman's terms, SQL Server query planner resolves even your basic inner joins without making any assumptions as to the content of the columns. Even though NULL values do not form a join, they must be included in the column index in order for query planner to use it, unless otherwise specified with predicates such as WHERE joinCol_ID IS NOT NULL. Basically, SQL Server does not use filtered indexes for joins at all, unless the queries themselves are modified to account for the filter value. Instead, it will create new statistics on these columns and / or use a clustered index scan or other indexes including the column, whichever it deems most effective. Using filtered indexes on foreign keys is therefore an absolutely horrid idea.

We still have no idea how months worth of testing this in multiple other environments never produced the same results outside of this one, single DB, but this is the way it's supposed to work. Apparently something that as far as we know is not related to cache, statistics or configurations, caused the production DB to behave differently and correctly detect and use the filtered indexes, while all of the testing environments simply used the old indexes (seeing as the indexes were dropped and recreated with the same name, this seems a valid theory even if there is no real proof).

So the lesson of the story: The web is filled with examples of how underused filtered indexes are, how awesome they can be. But this serious downside never popped up except as a nagging thought in the back of my head saying "if these are so great, then why aren't NULL values filtered out of indexes by default, since they only take up space and only serve a purpose in special circumstances"? Well, now I know why. :)

Sql-server – Differences Between Two Different Create Index Commands

It boils down to looking what the default values are. Lets break this down:

CREATE UNIQUE NONCLUSTERED INDEX [DEID_MAP_IDX1] ON [dbo].[DEID_MAP]

nonclustered is specified here. The default (i.e. nothing specified) is nonclustered. So unless clustered is specified it will default to nonclustered. So that's the same in both scripts.

[dbo] is specified here explicitly. As for the second un-specified CREATE INDEX then it all depends on what the current user's default schema is. Only you can answer that at the moment, so that may or may not default to dbo.

WITH (
    PAD_INDEX  = OFF, 
    STATISTICS_NORECOMPUTE  = OFF, 
    IGNORE_DUP_KEY  = OFF, 
    ALLOW_ROW_LOCKS = ON, 
    ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY]

PAD_INDEX: the default is OFF, so unspecified will be the same in the second script as it is in the first.

STATISTICS_NORECOMPUTE: the default is OFF, so the second script unspecified has the same value.

IGNORE_DUP_KEY: the default is OFF, so the second CREATE INDEX is identical with this parameter.

ALLOW_ROW_LOCKS: the default is ON, so the second CREATE script has the same behavior.

ALLOW_PAGE_LOCKS: the default is ON...the second script has identical behavior.

... ON [PRIMARY]: just like the default schema one, this all depends on what your default filegroup is. If PRIMARY is the default filegroup, your second CREATE INDEX script will also create the index on PRIMARY. If PRIMARY is not the default filegroup, then it will be a different filegroup, as an unspecified filegroup will go to the default filegroup.

All of this information and default values can be found on this BOL reference here.

Best Answer

Related Solutions

Sql-server – SQL Server 2008 query planner failing after index drop & recreate

Sql-server – Differences Between Two Different Create Index Commands

Related Question