Postgresql – Why is this PostgreSQL query table scanning

index-tuningpostgresql

I am running PostgreSQL 9.3. The database is of all of the medical NPIs (national provider numbers). I have created the following indexes:

CREATE INDEX indiv_provider_business_mailing_address_state_name_string_index
  ON individuals (provider_business_mailing_address_state_name);
CREATE INDEX indiv_provider_business_practice_location_address_state_name_string_index
  ON individuals (provider_business_practice_location_address_state_name);
CREATE INDEX indiv_upper_provider_business_mailing_address_state_name_string_index
  ON individuals (UPPER(provider_business_mailing_address_state_name));
CREATE INDEX indiv_upper_provider_business_practice_location_address_state_name_string_index
  ON individuals (UPPER(provider_business_practice_location_address_state_name));

but the query still table scans:

Postgres Running
  SELECT npi, provider_first_name,
    provider_last_name_legal_name,
    provider_organization_name_legal_business_name,
    provider_first_line_business_mailing_address,
    provider_second_line_business_mailing_address,
    provider_business_mailing_address_city_name,
    provider_business_mailing_address_state_name,
    substr(provider_business_mailing_address_postal_code, 1, 5)
      AS provider_business_mailing_address_postal_code,
    provider_business_mailing_address_telephone_number,
    provider_business_mailing_address_fax_number,
    provider_first_line_business_practice_location_address,
    provider_second_line_business_practice_location_address,
    provider_business_practice_location_address_city_name,
    provider_business_practice_location_address_state_name,
    substr(provider_business_practice_location_address_postal_code, 1, 5)
      AS provider_business_practice_location_address_postal_code,
    provider_business_practice_location_address_telephone_number,
    provider_business_practice_location_address_fax_number
  FROM individuals
  WHERE ((((provider_business_mailing_address_state_name = UPPER(regexp_replace(?, '\s+', ' ', 'g'))))
    OR ((provider_business_practice_location_address_state_name = UPPER(regexp_replace(?, '\s+', ' ', 'g'))))))
  LIMIT 100

Explain output:

Postgres Limit  (cost=0.00..400.58 rows=100 width=141)
  ->  Seq Scan on individuals  (cost=0.00..1210010.61 rows=302063 width=141)
        Filter: ((provider_business_mailing_address_state_name = 'PA'::text) OR (provider_business_practice_location_address_state_name = 'PA'::text))

Any idea why?

UPDATE:

I changed the query as suggested by Craig Ringer (https://dba.stackexchange.com/a/118850/7924), but it is still table scanning:

Postgres Running
  SELECT npi,
    provider_first_name,
    provider_last_name_legal_name,
    provider_organization_name_legal_business_name,
    provider_first_line_business_mailing_address,
    provider_second_line_business_mailing_address,
    provider_business_mailing_address_city_name,
    provider_business_mailing_address_state_name,
    substr(provider_business_mailing_address_postal_code, 1, 5)
      AS provider_business_mailing_address_postal_code,
    provider_business_mailing_address_telephone_number,
    provider_business_mailing_address_fax_number,
    provider_first_line_business_practice_location_address,
    provider_second_line_business_practice_location_address,
    provider_business_practice_location_address_city_name,
    provider_business_practice_location_address_state_name,
    substr(provider_business_practice_location_address_postal_code, 1, 5)
      AS provider_business_practice_location_address_postal_code,
    provider_business_practice_location_address_telephone_number,
    provider_business_practice_location_address_fax_number
  FROM individuals
  WHERE ((((provider_business_mailing_address_state_name = ?))
    OR ((provider_business_practice_location_address_state_name = ?))))
  LIMIT 100

Explain output:

Postgres Limit  (cost=0.00..400.58 rows=100 width=141)
  ->  Seq Scan on individuals  (cost=0.00..1210010.61 rows=302063 width=141)
    Filter: ((provider_business_mailing_address_state_name = 'PA'::text) OR (provider_business_practice_location_address_state_name = 'PA'::text))

Best Answer

Your expression indexes aren't used, or usable.

You're indexing an expression that doesn't match what you're querying. The expression is on the parameter side, e.g.:

UPPER(regexp_replace(?, '\s+', ' ', 'g'))

PostgreSQL is constant-folding that based on the substituted parameter into, in the case you've supplied, the literal 'PA'. Then looking it up against the column provider_business_mailing_address_state_name.

If you intended to search against the uppercased, trimmed version of the provider_business_mailing_address_state_name column, like your expression index implies, you'd have to use the expression in the SQL too, e.g.

OR ((UPPER(regexp_replace(provider_business_practice_location_address_state_name, '\s+', ' ', 'g')) = UPPER(regexp_replace(?, '\s+', ' ', 'g'))))))

but frankly, in this case I suggest normalizing the data in the table in-place instead of relying on an expression index.