Postgresql – Duplicate results from query trying to find cross-indexed multiple tables

join;postgresql

I working with a database of several hundred schemas with 250 tables and thousands of records. PGAdmin III 1.22 as my only tool. A GUI interface to the database is not robustly written and occasionally cross-indexes some entries. I am trying to write a query that shows a list of the cross-indexed items and display their LABEL to the users rather than the ID. This allows the user to use the GUI program to fix the issue with out requiring postgresql knowledge.

Here's the query at this point:

SET search_path="SCHEMA_NAME";
SELECT 
  e1.label,
  c1.label,
  h1.contentsid,
  h1.group
FROM
  hostdata as h1
INNER JOIN hostdata as h2
  on h1.group = h2.group
  and h1.contentsid = h2.contentsid
  and h1.id != h2.id
INNER JOIN class as c1
  on t1.classid = c1.id
INNER JOIN element as e1
  on t1.elementid = e1.id
ORDER BY
  t1.group ASC,
  t1.contentsid ASC,
  c1.label ASC,
  e1.label ASC;

The cross-indexing occurs when the GROUP and CONTENTSID fields are both duplicated in two or more records in the HOSTDATA table. It's OK to have one duplicated as long as both fields are not the same. If it matters, the GROUP field is text.

What I am trying to do is compare the records in HOSTDATA to each other, list those that match another record in the GROUP and CONTENTSID fields, and then list them along with the LABELs from the CLASS and ELEMENT tables.

The above query works perfectly as long as there are only two matches, but as soon as the number of matches exceeds that the results multiply – 8 matches results in 7 returns per record – one for each match – 8 times 7 instead of just a list of 8. This is obviously because the first INNER JOIN returns multiple true conditions.

Performance is not an issue. I haven't yet come up with a way to re-arrange the sort and query to return the desired truncated output.

Best Answer

Never mind. I re-wrote it and this seems to work;

SET search_path="SCHEMA_NAME";
SELECT e1.label, c1.label, h1.contentsid, h1.group
FROM hostdata as h1
INNER JOIN (
    SELECT contentsid, group
    FROM hostdata 
        GROUP BY contentsid, group
    HAVING COUNT(id) > 1
    ) dup 
    ON h1.group = dup.group AND h1.contentsid = dup.contentsid
INNER JOIN class as c1
  on h1.classid = c1.id
INNER JOIN element as e1
  on h1.elementid = e1.id
ORDER BY t1.group, t1.contentsid;