Postgresql – Lower() vs ilike on postgresql

ormpostgresql

What I want to do is to check if value of column equals some string, but case insensitive. Is there a performance difference when using ilike without wildcards (percents) and lower() ?

I was planning to create index on lower(column_name) and use lower(column_name) = value (value is lower case already) but it turns out that ORM I'm using doesn't allow me to use lower() in a pretty way (but has convenient method for ilike). Will column_name ilike 'value' use lower(column_name) index?

Best Answer

Measure it and see, using explain analyze on test data.

I would expect a significant difference between an expression index on lower(col) with lower(col) = lower('constant') vs use of col ilike 'constant', with the expression index on lower(col) the faster.

That's because Pg can do a bitmap index scan with the expression index, but for the ilike it'll have to do a seqscan. AFAIK a text_pattern_ops index (useful for LIKE) won't do you any good for ILIKE.

The other question is whether the difference is worth caring about, and that depends on your data and workload. Sometimes it's smarter to just choose the slower but easier option and spend yourtime/effort elsewhere.

Will column_name ilike 'value' use lower(column_name) index?

No. Nor will it use an index on lower(column_name) text_pattern_ops. AFAIK there is no way to index a case-insensitive pattern match (ILIKE or ~~*) on a case-sensitive text/varchar in PostgreSQL at this time.

I don't think the citext type changes this, though it does make things more convenient.

Related Solutions

PostgreSQL – Indexes: Integer vs String Performance

Short answer: integer is faster than varchar or text in every aspect. Won't matter much for small tables and / or short keys. The difference grows with the length of the keys and the number of rows.

string ... 20 characters long, which in memory is roughly 5x that of the integer (if an integer is 4 bytes, and the strings are pure ASCII at 1 byte per character, then this holds)

To be precise, character types (text or varchar) occupy exactly 21 bytes for 20 ASCII characters on disk and 23 bytes in RAM. Detailed assessment:

What is the overhead for varchar(n)?

Also important: COLLATION rules can make sorting character data more expensive - unlike numeric data types:

Would index lookup be noticeably faster with char vs varchar when all values are 36 chars

Index size is probably responsible for the lion share of performance difference in most cases. Consider the overhead per index tuple (basically the same as for a table): 4 bytes for the item identifier and 8 bytes for the index tuple header. So the index tuple for integer would amount to 20 bytes (including 4 bytes of alignment padding) and for varchar(20) with 20 ASCII characters it would be 36 bytes (also incl. padding). Details:

Configuring PostgreSQL for read performance

All the theory aside: it's best to just test:

Measure the size of a PostgreSQL table row

Postgres 9.5 introduced an optimization for sorting long strings of character data (key word "abbreviated keys"). But a bug in some C library functions on Linux forced the project to disable the feature for non-C collations in Postgres 9.5.2. Details in the release notes.

However, if you actually use Postgres enum types, most of these considerations are irrelevant, since those are implemented with integer values internally anyway. The manual:

An enum value occupies four bytes on disk.

Aside: varchar(255) used to make sense for early versions of SQL Server, which could use a more efficient data type internally up to the limit of 255 characters. But the odd length restriction of 255 characters has no special meaning in Postgres at all.

PostgreSQL – How to Use Array as Argument to VARIADIC Function

Pass the array as is using the VARIADIC key word in the call:

CREATE OR REPLACE FUNCTION json_extract_path_text(string citext, VARIADIC params text[])
  RETURNS text LANGUAGE sql IMMUTABLE AS
'SELECT json_extract_path_text(string::json, VARIADIC params)';

Call:

SELECT json_extract_path_text('{"f1":{"f2":1,"f3":"foo"}}', VARIADIC '{f1, f3}');

 json_extract_path_text
----------------------
 foo

The manual on VARIADIC:

Sometimes it is useful to be able to pass an already-constructed array to a variadic function; this is particularly handy when one variadic function wants to pass on its array parameter to another one. You can do that by specifying VARIADIC in the call:
SELECT mleast(VARIADIC ARRAY[10, -1, 5, 4.4]);

More details:

Select rows such that names match elements of input array for pgsql function

To be clear: Inside the function, the VARIADIC parameter is just another array, nothing special about it. VARIADIC being used in two separate function definitions makes the case a bit confusing. The solution is to use the same keyword VARIADIC for a related but different purpose in the function call. Don't get confused even more.

Aside: Do not quote the language name, it's an identifier.

Case-insensitive version with `citext`?

I am trying to make a case-insensitive version of json_extract_path_text(), using the citext module.

While the above function works, it's not case-insensitive at all. citext preserves original input, which is restored with the cast to text (or json) and can be mixed-case.

I would not use citext to begin with. Various corner case problems:

Index on column with data type citext not used

For your purpose I suggest calling the built-in json_extract_path_text() with lower(citext_value), which returns lower-case text, and lower-case the 2nd parameter ("path elements") as well, to make it actually case-insensitive:

SELECT json_extract_path_text(lower('{"F1":{"f2":1,"f3":"foo"}}'::citext)::json
                            , VARIADIC lower('{f1, F3}')::text[]);

Note the cast to text[] after lower().

Best Answer

Related Solutions

PostgreSQL – Indexes: Integer vs String Performance

PostgreSQL – How to Use Array as Argument to VARIADIC Function

Case-insensitive version with citext?

Related Question

Case-insensitive version with `citext`?