Relational Algebra – Find Highest Graded Student in Each State

relational-algebrarelational-theory

Suppose I have a table of students, containing their ID, grade and state:

-------------------------
| id  |  grade  | state |
------------------------
|  1  |    83   |   CA  |
|  2  |    94   |   TX  |
|  3  |    92   |   WA  |
|  4  |    78   |   CA  |

And I want the ID of the students in each state with the highest grade (ex. 1, 2 and 3), how would I go about doing that?

I know how to find the maximum (can do the cross product (renaming as R1 and R2) and then select R1.grade < R2.grade for those who aren't the top, and subtract that from the original database). But I'm confused at how to do that for each state.

Best Answer

I don't actually feel very comfortable with relational algebra, so, I'll do it first using standard SQL and then use a tool called RelaX - relational algebra calculator 0.18.2 to do the translation.

First, the table you wrote, I'll call it students, and define it and fill it with:

CREATE TABLE students
(
    id INTEGER PRIMARY KEY,
    grade INTEGER,
    state TEXT
) ;

INSERT INTO students
    (id, grade, state)
VALUES
    (1, 83, 'CA'),
    (2, 94, 'TX'),
    (3, 92, 'WA'),
    (4, 78, 'CA') ;

RelaX will translate this into a dataset, represented by the following tuples:

group: Joan (imported from SQL)

students = {
    id:number, grade:number, state:string
    1        , 83          , 'CA'        
    2        , 94          , 'TX'        
    3        , 92          , 'WA'        
    4        , 78          , 'CA'        
}

In order to find what you're looking for, we first need a table with tuples in the form (state, grade), having the maximum grade of each state. This query is done, in SQL with a MAX(grade) per state using a GROUPs BY state. You would write it like:

SELECT
    state, max(grade) AS grade
FROM 
    students AS s2 
GROUP BY
    state ;

Next, you need to JOIN this table (that is named max_grades) to the students one, and you do it ON equal states, and equal grade (i.e.: the max grade per state)...

SELECT
    s1.id
FROM
    students AS s1
    JOIN 
    (
        SELECT
            state, max(grade) AS grade
        FROM 
            students
        GROUP BY
            state
    ) AS max_grades 
    ON s1.state = max_grades.state AND s1.grade = max_grades.grade

... and this gets translated by RelaX to the following relational algebra expression and reponse:

π s1.id ρ s1 students ⨝ s1.state = max_grades.state and s1.grade = max_grades.grade ρ max_grades ( π state, grade γ state; MAX(grade)→grade ρ s2 students)

s1.id
1
2
3

enter image description here

NOTE1:

  1. If several students of one state would have the max grade, this expression would return ALL of them, not just an arbitrary one of that state.

Alternative:

If you cannot GROUP BY, you can use another construct:

SELECT DISTINCT
    id
FROM
    students
EXCEPT
SELECT
    s1.id
FROM
    students AS s1
    JOIN students AS s2 ON s1.state = s2.state AND s1.grade < s2.grade

This goes a bit more in-line with your original thinking, although I personally find it less clear...

The translation to relational algebra is:

π id students - π s1.id ρ s1 students ⨝ s1.state = s2.state and s1.grade < s2.grade ρ s2 students

enter image description here