Postgresql – Getting duplicate results in many to many query

aggregatecountmany-to-manypostgresql

I have designed a bookstore app with these tables. There's a book with an M2M relation to author. Lastly, there's table reader which keeps track of which user has read the book i.e. an entry in the read table means that book has been read by the corresponding user.

Relation defined in SQLFiddle

I want to select the books with their author names and also keep a count of how many people have read a particular book. Here's my query :

Question : Count of the number of readers for a book is wrong (coincidentally equal to the number of authors). For a book with more than 1 reader, the author names are repeated.

  SELECT b.id, b.title, COUNT(r.user1_id) AS read_ct, 
         array_agg(author.name)
  FROM book b
  LEFT OUTER JOIN reader r ON r.book_id = b.id      
  LEFT OUTER JOIN book_author ba ON ba.book_id = b.id    
  LEFT OUTER JOIN author ON author.id = ba.author_id          
  GROUP BY b.id

Which of these solutions is better ?

Solution 1 : Use `DISTINCT clause i.e.

 SELECT b.id, b.title, COUNT(DISTINCT r.user1_id) AS read_ct, 
     array_agg(DISTINCT author.name)

Solution 2 : Use subquery

 SELECT s.id, s.title, s.names, COUNT(r.used1_id) AS read_ct
 FROM (
   SELECT b.id, b.title, array_agg(author.name) AS names
   FROM book b
   LEFT OUTER JOIN book_author ba ON ba.book_id = b.id    
   LEFT OUTER JOIN author ON author.id = ba.author_id          
   GROUP BY b.id
 ) AS s
 LEFT OUTER JOIN reader r ON r.book_id = s.id
 GROUP BY s.id, s.title, s.names      

Best Answer

Objective: Count book's readers. Display book title, count, authors.

  • JOIN the (Subquery: book.array_agg(authors))
  • TO (Subquery: count book readers)
  • TO Book for id, title.

I think the logical error, as you mentioned Cartesian product, was aggregating two different aggregations at the same time...

Something like this should work:

PostgreSQL:

SELECT id, title, read_ct.Readers, Authors.Names
FROM book b
JOIN ( SELECT book_id, array_agg( name) as Names
       FROM book_author ba
       JOIN author a
       ON a.id = ba.author_id
       GROUP by book_id
      ) Authors
ON Authors.book_id = b.id
JOIN (
    SELECT book_id, count(r.user1_id) Readers
    FROM reader r
    GROUP BY book_id ) read_ct
on read_ct.book_id = b.id

SQL Server:

SELECT id, title, read_ct.Readers, STUFF(
                (SELECT ', ', a.name
                FROM book_author ba
                    JOIN author a
                    ON a.id = ba.author_id
                WHERE ba.book_id = b.id
                FOR XML PATH(''), TYPE)
                .value('.', 'varchar(max)'), 1, 2, '')
    FROM book b
    JOIN (
        SELECT book_id, count(r.user1_id) Readers
        FROM reader r
        GROUP BY book_id ) read_ct
    on read_ct.book_id = b.id