Postgresql – Left join not working with sub-query

join;postgresqlsubquery

I've got table of scores where each entry corresponds to a particular (student, subject) pair.

CREATE TABLE score (
  id serial PRIMARY KEY,
  value integer NOT NULL,
  subject_id integer NOT NULL,
  student_id integer NOT NULL,
  CONSTRAINT s1_id FOREIGN KEY (subject_id) REFERENCES subject (id),
  CONSTRAINT s2_id FOREIGN KEY (student_id) REFERENCES student (id),
);

I want to pick the top 5 subjects with highest overall scores, and then compute the average score of each student across those 5 subjects. Some students may not have entries for some subjects. Those values would be given a default score.

Here's what I have:

SELECT student_id, AVG(COALESCE(score.value, default_value)) FROM 
(
    SELECT score.subject_id, subject.name, SUM(score.value) AS score_sum
    FROM score
    JOIN subject on subject.id = score.subject_id
    WHERE subject.name != 'skip me'
    GROUP BY score.subject_id
    ORDER BY score_sum DESC
    LIMIT 5
) AS score_sort
LEFT JOIN score ON score_sort.subject_id = score.subject_id 
GROUP BY student_id

The inner query works correctly to select the top 5. But the LEFT JOIN in outer query does not select the rows where a student does not have a score. What am I doing wrong here ?

Best Answer

You could use a CROSS JOIN from score_sort to all the students and then a LEFT join to score:

SELECT st.student_id, 
       AVG(COALESCE(sc.value, default_value)) AS average_score
FROM 
(
    SELECT score.subject_id, subject.name, SUM(score.value) AS score_sum
    FROM score
    JOIN subject on subject.id = score.subject_id
    WHERE subject.name != 'skip me'
    GROUP BY score.subject_id
    ORDER BY score_sum DESC
    LIMIT 5
) AS score_sort AS sort 
  CROSS JOIN student AS st
  LEFT JOIN score AS sc
    ON  sort.subject_id = sc.subject_id 
    AND   st.student_id = sc.student_id
GROUP BY st.student_id ;

A different approach would be to alter your query to count the available scores each student has (in the 5 top subjects):

SELECT score.student_id, 
       (SUM(score.value) + (5 - COUNT(score.subject_id)) * default_value) / 5 
           AS average_score
FROM 
(
    SELECT score.subject_id, subject.name, SUM(score.value) AS score_sum
    FROM score
    JOIN subject on subject.id = score.subject_id
    WHERE subject.name != 'skip me'
    GROUP BY score.subject_id
    ORDER BY score_sum DESC
    LIMIT 5
) AS score_sort
JOIN score ON score_sort.subject_id = score.subject_id 
GROUP BY score.student_id ;

But even with this, you would still not get any result for the students that have no score in any of the 5 top subjects.