The below SQL query runs in 2s:
SELECT
*
FROM
fb_name
INNER JOIN fb_kg ON mid = subject_mid
WHERE
LOWER(alias) = LOWER('Polaski')
AND fb_kg.relation = 'location/location/containedby'
The below SQL query runs in 2ms:
SELECT
*
FROM
fb_name
INNER JOIN fb_kg ON mid = subject_mid
WHERE
alias = 'Polaski'
AND fb_kg.relation = 'location/location/containedby'
The difference between the two queries is the inclusion of the LOWER function.
There has been an index created for both queries:
CREATE INDEX fb_name_lower_alias ON fb_name (lower(alias));
CREATE INDEX fb_name_mid_lower_alias_index ON fb_name (mid, lower(alias));
CREATE INDEX fb_name_alias ON fb_name (alias);
CREATE INDEX fb_name_mid_alias_index ON fb_name (mid, alias);
EXPLAIN for LOWER:
"Gather (cost=247515.73..249133.22 rows=5708 width=70)"
" Workers Planned: 2"
" -> Merge Join (cost=246515.73..247562.42 rows=2378 width=70)"
" Merge Cond: ((fb_kg.subject_mid)::text = (fb_name.mid)::text)"
" -> Sort (cost=205352.30..205794.73 rows=176972 width=44)"
" Sort Key: fb_kg.subject_mid"
" -> Parallel Bitmap Heap Scan on fb_kg (cost=14116.24..184480.39 rows=176972 width=44)"
" Recheck Cond: ((relation)::text = 'location/location/containedby'::text)"
" -> Bitmap Index Scan on relation_index (cost=0.00..14010.06 rows=424733 width=0)"
" Index Cond: ((relation)::text = 'location/location/containedby'::text)"
" -> Sort (cost=41163.43..41232.45 rows=27611 width=26)"
" Sort Key: fb_name.mid"
" -> Bitmap Heap Scan on fb_name (cost=750.54..39126.71 rows=27611 width=26)"
" Recheck Cond: (lower((alias)::text) = 'polaski'::text)"
" -> Bitmap Index Scan on fb_name_lower_alias_varchar_pattern_ops (cost=0.00..743.64 rows=27611 width=0)"
" Index Cond: (lower((alias)::text) = 'polaski'::text)"
Other EXPLAIN:
"Nested Loop (cost=11.45..13502.52 rows=2 width=70)"
" -> Bitmap Heap Scan on fb_name (cost=4.64..48.23 rows=11 width=26)"
" Recheck Cond: ((alias)::text = 'Polaski'::text)"
" -> Bitmap Index Scan on fb_name_alias_index (cost=0.00..4.64 rows=11 width=0)"
" Index Cond: ((alias)::text = 'Polaski'::text)"
" -> Bitmap Heap Scan on fb_kg (cost=6.81..1223.04 rows=8 width=44)"
" Recheck Cond: ((subject_mid)::text = (fb_name.mid)::text)"
" Filter: ((relation)::text = 'location/location/containedby'::text)"
" -> Bitmap Index Scan on subject_mid_index (cost=0.00..6.81 rows=316 width=0)"
" Index Cond: ((subject_mid)::text = (fb_name.mid)::text)"
CREATE TABLE definitions
CREATE TABLE fb_kg
(object_mid varchar NOT NULL,
relation varchar NOT NULL,
subject_mid varchar NOT NULL,
PRIMARY KEY(object_mid, relation, subject_mid));
CREATE TABLE fb_name
(mid varchar NOT NULL,
alias varchar NOT NULL,
PRIMARY KEY(mid, alias));
How come with the same indexes the LOWER function is about 60x slower?
Best Answer
Using
citext
My suggestion is to move to the citext module.
Then simply stop using
lower()
everywhere. Your query should like this,Everything else should work the same. Paste the
EXPLAIN ANALYZE
of the new query.Update on indexes, and case-sensitive searching on
citext
If you need a case-sensitive query on a
citext
column, you just have to cast the other side to::text
, and in fact using thecitext
index is usually good enough. For example you can do a case-sensitive index only scan withWHERE col::citext = 'fOo'::citext AND col::text = 'fOo'
; or, you could create two indexes on thecitext
column:ON table(myCITEXT)
ON table(myCITEXT::text)