Short answer: integer
is faster than varchar
or text
in every aspect. Won't matter much for small tables and / or short keys. The difference grows with the length of the keys and the number of rows.
string ... 20 characters long, which in memory is roughly 5x that of
the integer (if an integer is 4 bytes, and the strings are pure ASCII
at 1 byte per character, then this holds)
To be precise, character types (text
or varchar
) occupy exactly 21 bytes for 20 ASCII characters on disk and 23 bytes in RAM. Detailed assessment:
Also important: COLLATION
rules can make sorting character data more expensive - unlike numeric data types:
Index size is probably responsible for the lion share of performance difference in most cases. Consider the overhead per index tuple (basically the same as for a table): 4 bytes for the item identifier and 8 bytes for the index tuple header. So the index tuple for integer
would amount to 20 bytes (including 4 bytes of alignment padding) and for varchar(20)
with 20 ASCII characters it would be 36 bytes (also incl. padding). Details:
All the theory aside: it's best to just test:
Postgres 9.5 introduced an optimization for sorting long strings of character data (key word "abbreviated keys"). But a bug in some C library functions on Linux forced the project to disable the feature for non-C collations in Postgres 9.5.2. Details in the release notes.
However, if you actually use Postgres enum
types, most of these considerations are irrelevant, since those are implemented with integer
values internally anyway. The manual:
An enum
value occupies four bytes on disk.
Aside: varchar(255)
used to make sense for early versions of SQL Server, which could use a more efficient data type internally up to the limit of 255 characters. But the odd length restriction of 255 characters has no special meaning in Postgres at all.
Pass the array as is using the VARIADIC
key word in the call:
CREATE OR REPLACE FUNCTION json_extract_path_text(string citext, VARIADIC params text[])
RETURNS text LANGUAGE sql IMMUTABLE AS
'SELECT json_extract_path_text(string::json, VARIADIC params)';
Call:
SELECT json_extract_path_text('{"f1":{"f2":1,"f3":"foo"}}', VARIADIC '{f1, f3}');
json_extract_path_text
----------------------
foo
The manual on VARIADIC
:
Sometimes it is useful to be able to pass an already-constructed array
to a variadic function; this is particularly handy when one variadic
function wants to pass on its array parameter to another one. You can
do that by specifying VARIADIC
in the call:
SELECT mleast(VARIADIC ARRAY[10, -1, 5, 4.4]);
More details:
To be clear: Inside the function, the VARIADIC
parameter is just another array, nothing special about it. VARIADIC
being used in two separate function definitions makes the case a bit confusing. The solution is to use the same keyword VARIADIC
for a related but different purpose in the function call. Don't get confused even more.
Aside: Do not quote the language name, it's an identifier.
Case-insensitive version with citext
?
I am trying to make a case-insensitive version of json_extract_path_text()
, using the citext module.
While the above function works, it's not case-insensitive at all. citext
preserves original input, which is restored with the cast to text
(or json
) and can be mixed-case.
I would not use citext
to begin with. Various corner case problems:
For your purpose I suggest calling the built-in json_extract_path_text()
with lower(citext_value)
, which returns lower-case text
, and lower-case the 2nd parameter ("path elements") as well, to make it actually case-insensitive:
SELECT json_extract_path_text(lower('{"F1":{"f2":1,"f3":"foo"}}'::citext)::json
, VARIADIC lower('{f1, F3}')::text[]);
Note the cast to text[]
after lower()
.
Best Answer
Measure it and see, using
explain analyze
on test data.I would expect a significant difference between an expression index on
lower(col)
withlower(col) = lower('constant')
vs use ofcol ilike 'constant'
, with the expression index onlower(col)
the faster.That's because Pg can do a bitmap index scan with the expression index, but for the
ilike
it'll have to do a seqscan. AFAIK atext_pattern_ops
index (useful forLIKE
) won't do you any good forILIKE
.The other question is whether the difference is worth caring about, and that depends on your data and workload. Sometimes it's smarter to just choose the slower but easier option and spend yourtime/effort elsewhere.
No. Nor will it use an index on
lower(column_name) text_pattern_ops
. AFAIK there is no way to index a case-insensitive pattern match (ILIKE
or~~*
) on a case-sensitive text/varchar in PostgreSQL at this time.I don't think the
citext
type changes this, though it does make things more convenient.