In terms of search capability and performance, what are the pros & cons of implementing tags as an array
or text
or full text field?
I am seeing limitations of capability in choosing to implement tags with an array, because it seems to me that if an array contains ['juicy fruit']
then someone searching for 'fruit'
will not find that record (for example, tags && ARRAY['fruit']
will not find it). The person who created that record would have had to enter something more like ['juicy fruit', 'juicy', 'fruit']
to have their record found with just a search for 'fruit'
or 'juicy'
. Whereas if I implement tags as text
, then a search for 'fruit'
will find 'juicy fruit'
, and further still, if I implement tags as full text then I will find 'juicy fruit'
when searching with the string 'fruits'
(plural). Furthermore, I think there will be no performance loss in going for full text search. Thoughts?
But maybe the whole point of tags is exact match.
Best Answer
You can get search functionality with arrays by using the
ANY/SOME
or generate_subscripts functions. However, the PostgreSQL documentation notes the following:So it looks like you're on the right track, because you're correct about not losing performance (you would more likely gain it) and the other methods allow greater flexibility. Tags themselves are about exact matches, but most sites have a way to search tags too (for instance, this very site).
If you implement tag storage as full text, that doesn't preclude matching, including regex, with simpler expressions such as
LIKE/ILIKE
. A common scenario is to store documents (the chunks of text being searched) astext
orvarying character
and store thetsvector
type necessary for full-text search in a separate column. See the PostgreSQL documentation for more on creating a full text index.The pros and cons should then be evident: using an
array
limits scalability and search options because it lacks text indexing, stemming and inflection support, but is easy to use when an exact match is desired. Using atext
datatype is more flexible when searching for strings. Using atext
datatype with a generatedtsvector
column provides the most flexibility and good performance in all scenarios, but requires more storage.