PostgreSQL – List of Integers Separated by Comma vs Integer Array for Performance

arraypostgresql

As per subject, I have a field which is varchar in which i store some numbers separated by commas. The list grows and shrinks based on clients choices. I then use it in queries to check if a particular value is in that list against

value || ','

It is working well but was thinking of converting it to an integer array.

Would this be a better solution both in the technical/performance way and to properly program the database?

Best Answer

It's highly likely that the best approach will be a side-table of sometable(main_id, value) where you have a composite index on (main_id, value). This allows very fast lookups to see "for this mainid, does this value exist". This will let you enforce foreign key relationships. Unless you have a good reason, use this conventional relational approach.

Failing that, you can and should use an array field instead of a comma separated list. Using a comma-separated list is just downright horrid from a design point of view. It makes queries harder to write, more error prone, and forces you to do lots of slow and inefficient string manipulation and number parsing just for simple operations, and prevents any kind of integrity checking without very inefficient CHECK constaints or triggers. I think Bill nailed it with:

1,2,3,banana,5

With an array you can use the intarray extension to provide a GiST index that lets you quickly test if the array contains a given value using an indexable @> or <@ operation. You may want to add the btree_gist extension too, so you can create a composite GiST index of main_id, the_values_array, in case your queries are usually of the form:

WHERE main_id = blah AND the_values_array @> ARRAY[42]

(or have two separate indexes and see if it'll do a bitmap index scan).

You can't enforce a foreign key relationship into an array in PostgreSQL yet, though the feature seems to be on the way. You'd need to do it with somewhat complicated triggers in the mean time. Still, it's a lot better than a comma-separated list.