Postgresql – Full text search on two tsvector columns

existsfull-text-searchjoin;postgresql

I'm trying to do a full text search across two columns of two separate tables in PostgreSQL 9.4.2. The tables and columns are:

  • article.article_title
  • keyword.keyword

Where there is a many-to-many relationship between the two. I think I've got most of the way following this blog post. However, I am struggling to formulate the query across the two columns. This query is returning results:

SELECT distinct title, pub_date, web_id, city_points, city_name FROM (
  SELECT article.article_title as title,
  article.pub_date as pub_date,
  article.web_id as web_id,
  string_agg(st_asText(city.geom), ', ') as city_points,
  string_agg(city.city, ', ') as city_name,
  article.title_tsv ||
  keyword.keyword_tsv
  as document 
  from article, article_keywords, keyword, article_cities, city
  where article.id = article_keywords.article_id
  and keyword.id = article_keywords.keyword_id
  and article.id = article_cities.article_id
  and city.id = article_cities.city_id
  GROUP BY article.id, keyword.id) p_search
WHERE p_search.document @@ to_tsquery('putin');

But I think the query suffers many of the same deficiencies from a previous question I asked here.

I am also trying to return the following columns in the query:

  • article.pub_date
  • article.web_id
  • city.city_points
  • city.city_name

I am pretty new to full text search – and SQL in general.
Let me know if you need any more information.

Best Answer

In parts similar to my last answer:

SELECT a.id, a.article_title AS title, a.pub_date, a.web_id
       c.city_points, c.city_names
FROM   article a
LEFT   JOIN (
   SELECT ac.article_id AS id
        , string_agg(st_asText(c.geom), ', ') AS city_points
        , string_agg(c.city, ', ') AS city_names
   FROM   article_cities ac
   JOIN   city           c ON c.id = ac.city_id
   GROUP  BY 1
   ) c USING (id)
WHERE a.title_tsv @@ to_tsquery('putin')
OR    EXISTS (
   SELECT 1
   FROM   article_keywords ak
   JOIN   keyword          k ON k.id = ak.keyword_id
   WHERE  ak.article_id = a.id
   AND    k.keyword_tsv @@ to_tsquery('putin')
   );

What's new here: we are only interested in the existence of a matching keyword, we are not even displaying them. (You may want to rethink that: shouldn't it become obvious from the result why the row was found?)

So I am using an EXISTS semi-join as alternative WHERE clause. Related:

Once again we aggregate cities per article_id before joining to article, so we need no aggregate or DISTINCT in the outer SELECT since we do not multiply rows in the join.

And once again use LEFT [OUTER] JOIN to include articles that are not linked to any cities at all.

Be aware that a list of comma-separated tables combined with WHERE conditions to join them is effectively the same as [INNER] JOIN on a join condition. Basic in the manual here.