Find the highest graded student using Tuple Relational Calculus

database-theoryrelational-calculusrelational-theory

Suppose I have a table of students, containing their ID and their grade:

-----------------
| id  |  grade  |
-----------------
|  1  |    83   |
|  2  |    94   |
|  3  |    92   |
|  4  |    78   |

How do I write a Tuple Relational Calculus formula that refers to the student with the highest grade?

My attempt:

Thinking in terms of SQL, I would write a query that does a cartesian product of the table with itself, take every grade that is less than some other grade, and then subtract from the original table. However, in Tuple Relational Calculus, it's not possible to build sub-tables in sub-queries, which is why I'm stuck.

However, I've attempted something in this direction:

{ <id> | Ǝ grade1 ∈ students (id, grade) ⋀ Ǝ grade2 ∈ students (id, grade2) ⋀ grade1 > grade2}

I believe this would get me the lower grades, but how do I subtract this all from the original table of students? I'm not allowed to insert this statement into another TRC query. Thanks in advance for your help!

Best Answer

Given the schema Students(id:integer,grade:integer), you can solve the problem in tuple relational calculus by using the negation operator (¬).

{T1.id | ∃T1 ∈ Students ¬(∃T2 ∈ Students (T2.grade > T1.grade))}

This would return the id of all students in T1 where there is no student in T2 with a higher grade.

Because T1 and T2 are from the same relation, this effectively returns the set of students with the top grade.

If there are no ties for the top grade, it will return just one id.

It looks like you were thinking in terms of relational algebra rather than tuple relational calculus.

Tuple relational calculus does not have a set difference operator, so you can't find the maximum by subtracting all non-maximums.

Reference

The Solutions Manual for the third edition of Database Management Systems by Ragu Ramakrishnan and Johannes Gerke helped me solve this.

Question 4.3 asks you to solve some problems using this schema:

Suppliers(sid:integer,sname:string,address:string)
Parts(pid:integer,pname:string,color:string)
Catalog(sid:integer,pid:integer,cost:real)

Question 4.3.11 is similar to yours.

Find the pids of the most expensive parts supplied by suppliers named Yosemite Sham.

The solution looks like this:

{T | ∃T1 ∈ Catalog(∃X ∈ Suppliers
(X.sname = 'Yosemite Sham' ∧ X.sid = T1.sid) ∧ ¬(∃S ∈ Suppliers
(S.sname = 'Yosemite Sham' ∧ ∃Z ∈ Catalog
(Z.sid = S.sid ∧ Z.cost > T1.cost))) ∧ T.pid=T1.pid)}

The solution here is more complicated because of the joins between Suppliers and Catalog, but the essence of the solution is the same.

Elide the joins to get this:

T1 ∈ Catalog ...
(... ¬(... ∃Z ∈ Catalog
(... Z.cost > T1.cost)))

In the reference solution, free variable T has the same pid as bound variable T1.

Taking the pid of T1 directly is a simpler way to get the same result.