Postgresql – Getting duplicate results in many to many query

aggregatecountmany-to-manypostgresql

I have designed a bookstore app with these tables. There's a book with an M2M relation to author. Lastly, there's table reader which keeps track of which user has read the book i.e. an entry in the read table means that book has been read by the corresponding user.

Relation defined in SQLFiddle

I want to select the books with their author names and also keep a count of how many people have read a particular book. Here's my query :

Question : Count of the number of readers for a book is wrong (coincidentally equal to the number of authors). For a book with more than 1 reader, the author names are repeated.

  SELECT b.id, b.title, COUNT(r.user1_id) AS read_ct, 
         array_agg(author.name)
  FROM book b
  LEFT OUTER JOIN reader r ON r.book_id = b.id      
  LEFT OUTER JOIN book_author ba ON ba.book_id = b.id    
  LEFT OUTER JOIN author ON author.id = ba.author_id          
  GROUP BY b.id

Which of these solutions is better ?

Solution 1 : Use `DISTINCT clause i.e.

 SELECT b.id, b.title, COUNT(DISTINCT r.user1_id) AS read_ct, 
     array_agg(DISTINCT author.name)

Solution 2 : Use subquery

 SELECT s.id, s.title, s.names, COUNT(r.used1_id) AS read_ct
 FROM (
   SELECT b.id, b.title, array_agg(author.name) AS names
   FROM book b
   LEFT OUTER JOIN book_author ba ON ba.book_id = b.id    
   LEFT OUTER JOIN author ON author.id = ba.author_id          
   GROUP BY b.id
 ) AS s
 LEFT OUTER JOIN reader r ON r.book_id = s.id
 GROUP BY s.id, s.title, s.names

Best Answer

Objective: Count book's readers. Display book title, count, authors.

JOIN the (Subquery: book.array_agg(authors))
TO (Subquery: count book readers)
TO Book for id, title.

I think the logical error, as you mentioned Cartesian product, was aggregating two different aggregations at the same time...

Something like this should work:

PostgreSQL:

SELECT id, title, read_ct.Readers, Authors.Names
FROM book b
JOIN ( SELECT book_id, array_agg( name) as Names
       FROM book_author ba
       JOIN author a
       ON a.id = ba.author_id
       GROUP by book_id
      ) Authors
ON Authors.book_id = b.id
JOIN (
    SELECT book_id, count(r.user1_id) Readers
    FROM reader r
    GROUP BY book_id ) read_ct
on read_ct.book_id = b.id

SQL Server:

SELECT id, title, read_ct.Readers, STUFF(
                (SELECT ', ', a.name
                FROM book_author ba
                    JOIN author a
                    ON a.id = ba.author_id
                WHERE ba.book_id = b.id
                FOR XML PATH(''), TYPE)
                .value('.', 'varchar(max)'), 1, 2, '')
    FROM book b
    JOIN (
        SELECT book_id, count(r.user1_id) Readers
        FROM reader r
        GROUP BY book_id ) read_ct
    on read_ct.book_id = b.id

Related Solutions

Ms-access – “No Current Record” when using COUNT() in Access 2010

Okay, well, since nobody else chimed in and I needed to get it done, I took one last shot at it. Not quite sure on the why of it all, but I used basically the entire query as a subquery in the FROM clause, and then ran the report against that, and that seemed to do it...

SELECT
  [VendorCusts].[VendorPaid],
  [VendorCusts].[DE?],
  [VendorCusts].[IC?],
  [VendorCusts].[AG?],
  [VendorCusts].[GB?],
  COUNT(VendorCustID) AS [CustCount]
FROM
  (
    SELECT DISTINCT
      [Plan Revenue Expense].[Check To] AS [VendorPaid],
      [Support Provider].[DE?],
      [Support Provider].[IC?],
      [Support Provider].[Agency?] AS [AG?],
      [Support Provider].[GeneralBus?] AS [GB?],
      Customer.CustID AS [VendorCustID],
      Customer.LName AS [VendorCustLName],
      Customer.FName AS [VendorCustFName]
    FROM
      (((Customer
      INNER JOIN Plan ON Customer.CustID=Plan.CustID)
      INNER JOIN [Plan Revenue] ON [Plan].[Plan ID]=[Plan Revenue].[PlanID])
      INNER JOIN [Plan Revenue Expense] ON [Plan Revenue].[Rev ID]=[Plan Revenue Expense].[RevID])
      LEFT JOIN [Support Provider] ON [Plan Revenue Expense].[SP]=[Support Provider].[ID]
    WHERE
      (
        (
          ([Plan Revenue Expense].[First Day]>=[Expense Start "MM/DD/YYYY"])
          AND
          ([Plan Revenue Expense].[First Day]<=[Expense End "MM/DD/YYYY"])
        )
        OR
        (
          ([Plan Revenue Expense].[Last Day]>=[Expense Start "MM/DD/YYYY"])
          AND
          ([Plan Revenue Expense].[Last Day]<=[Expense End "MM/DD/YYYY"])
        )
      )
      AND NOT
      (
        [Plan Revenue].[Service]='111' OR
        [Plan Revenue].[Service]='222' OR
        [Plan Revenue].[Service] LIKE '333*'
      )
      AND NOT Customer.[Inactive?]=TRUE
    ORDER BY
      [Plan Revenue Expense].[Check To],
      Customer.LName,
      Customer.FName
  ) AS [VendorCusts]
GROUP BY
  [VendorCusts].[VendorPaid],
  [VendorCusts].[DE?],
  [VendorCusts].[IC?],
  [VendorCusts].[AG?],
  [VendorCusts].[GB?]
HAVING
  NOT ([VendorCusts].[DE?]=TRUE OR [VendorCusts].[IC?]=TRUE)
ORDER BY
  [VendorCusts].[DE?] ASC,
  [VendorCusts].[IC?] ASC,
  [VendorCusts].[AG?] ASC,
  [VendorCusts].[GB?] ASC,
  [VendorCusts].[VendorPaid] ASC;

This is more or less what I finally came up with and it appears to get the job done. Checked a few data points and they seemed to agree with the slightly more verbose version w/o a Count() function. So, I think the results are good.

Then I abstracted out the start and end of the date range so I could run a few different versions. And added a few other parameters I needed...

Seems like it's working as intended, if a bit kludge-y. Sometimes it's better to be 'right' than 'pretty.' ;)

Count for many-to-many relationship

You are close. In SQL Server, you have to GROUP BY before you state HAVING. Personally, I would do something like this

    SELECT COUNT(id) Surveys_w_2_LandOwners
    FROM surveys s
        JOIN 
        (
            SELECT COUNT(1),property_id
            FROM owners_properties
            GROUP BY property_id
            HAVING COUNT(1) > 1
        ) p ON p.property_id=s.property_id

Best Answer

Related Solutions

Ms-access – “No Current Record” when using COUNT() in Access 2010

Count for many-to-many relationship

Related Question