I am running PostgreSQL 9.3. The database is of all of the medical NPIs (national provider numbers). I have created the following indexes:
CREATE INDEX indiv_provider_business_mailing_address_state_name_string_index
ON individuals (provider_business_mailing_address_state_name);
CREATE INDEX indiv_provider_business_practice_location_address_state_name_string_index
ON individuals (provider_business_practice_location_address_state_name);
CREATE INDEX indiv_upper_provider_business_mailing_address_state_name_string_index
ON individuals (UPPER(provider_business_mailing_address_state_name));
CREATE INDEX indiv_upper_provider_business_practice_location_address_state_name_string_index
ON individuals (UPPER(provider_business_practice_location_address_state_name));
but the query still table scans:
Postgres Running
SELECT npi, provider_first_name,
provider_last_name_legal_name,
provider_organization_name_legal_business_name,
provider_first_line_business_mailing_address,
provider_second_line_business_mailing_address,
provider_business_mailing_address_city_name,
provider_business_mailing_address_state_name,
substr(provider_business_mailing_address_postal_code, 1, 5)
AS provider_business_mailing_address_postal_code,
provider_business_mailing_address_telephone_number,
provider_business_mailing_address_fax_number,
provider_first_line_business_practice_location_address,
provider_second_line_business_practice_location_address,
provider_business_practice_location_address_city_name,
provider_business_practice_location_address_state_name,
substr(provider_business_practice_location_address_postal_code, 1, 5)
AS provider_business_practice_location_address_postal_code,
provider_business_practice_location_address_telephone_number,
provider_business_practice_location_address_fax_number
FROM individuals
WHERE ((((provider_business_mailing_address_state_name = UPPER(regexp_replace(?, '\s+', ' ', 'g'))))
OR ((provider_business_practice_location_address_state_name = UPPER(regexp_replace(?, '\s+', ' ', 'g'))))))
LIMIT 100
Explain output:
Postgres Limit (cost=0.00..400.58 rows=100 width=141)
-> Seq Scan on individuals (cost=0.00..1210010.61 rows=302063 width=141)
Filter: ((provider_business_mailing_address_state_name = 'PA'::text) OR (provider_business_practice_location_address_state_name = 'PA'::text))
Any idea why?
UPDATE:
I changed the query as suggested by Craig Ringer (https://dba.stackexchange.com/a/118850/7924), but it is still table scanning:
Postgres Running
SELECT npi,
provider_first_name,
provider_last_name_legal_name,
provider_organization_name_legal_business_name,
provider_first_line_business_mailing_address,
provider_second_line_business_mailing_address,
provider_business_mailing_address_city_name,
provider_business_mailing_address_state_name,
substr(provider_business_mailing_address_postal_code, 1, 5)
AS provider_business_mailing_address_postal_code,
provider_business_mailing_address_telephone_number,
provider_business_mailing_address_fax_number,
provider_first_line_business_practice_location_address,
provider_second_line_business_practice_location_address,
provider_business_practice_location_address_city_name,
provider_business_practice_location_address_state_name,
substr(provider_business_practice_location_address_postal_code, 1, 5)
AS provider_business_practice_location_address_postal_code,
provider_business_practice_location_address_telephone_number,
provider_business_practice_location_address_fax_number
FROM individuals
WHERE ((((provider_business_mailing_address_state_name = ?))
OR ((provider_business_practice_location_address_state_name = ?))))
LIMIT 100
Explain output:
Postgres Limit (cost=0.00..400.58 rows=100 width=141)
-> Seq Scan on individuals (cost=0.00..1210010.61 rows=302063 width=141)
Filter: ((provider_business_mailing_address_state_name = 'PA'::text) OR (provider_business_practice_location_address_state_name = 'PA'::text))
Best Answer
Your expression indexes aren't used, or usable.
You're indexing an expression that doesn't match what you're querying. The expression is on the parameter side, e.g.:
PostgreSQL is constant-folding that based on the substituted parameter into, in the case you've supplied, the literal
'PA'
. Then looking it up against the columnprovider_business_mailing_address_state_name
.If you intended to search against the uppercased, trimmed version of the
provider_business_mailing_address_state_name
column, like your expression index implies, you'd have to use the expression in the SQL too, e.g.but frankly, in this case I suggest normalizing the data in the table in-place instead of relying on an expression index.