Relational Algebra Queries – How to Approach Problems

relational-theory

I have the following schema:

employee(sin, emp_name, emp_address, salary, supervisor_sin, dept#)

works_on(emp_sin, project_number, hours)

The question is:

Assume that no employee works on more than 2 projects. For those employees
working on two projects, give the employee name and the two project numbers.
Each employee name should appear once only

My solution so far:

R1 = works_on [theta-join]emp_sin=d.emp_sin [Rename]d(works_on) – gets tuples in the form "emp_sin | project_number | d.project_number | hours | d.hours"?

Result = [Projection]emp_name,project_number,d.project_number(employee [theta-join]sin=emp_sin R1) – gets all names, and 2 project numbers in the form "emp_name | project_number | d.project_number"?

Is this correct thinking? Am I missing something?

Best Answer

I think you're on the right track, but I'm not seeing a provision for restricting the results to employees who are actually working on two projects and the rename notation is confusing to me. You'll probably want to leverage a theta-join on a self join to accomplish this.

To start out, I would rename the relevant attributes in the work table in preparation for a self-join:

ρsin/emp_sin,project_number1/project_number works_on

Then project just those attributes for simplicity's sake:

πsin,project_number1 ( ρsin/emp_sin,project_number1/project_number works_on )

For visualization purposes, the equivalent SQL here would be something along the lines of:

SELECT  emp_sin 
            AS sin,
        project_number
            AS project_number1
FROM    works_on;

I'd then prepare for the self join by creating another relationship, renaming the project_number differently:

πsin,project_number2 ( ρsin/emp_sin,project_number2/project_number works_on )

At this point, a theta-join can be applied, exploiting the maximum of 2 carnality as defined in the assumption. This theta-join creates a relationship with both project_numbers by emp_sin:

( πsin,project_number1 ( ρsin/emp_sin,project_number1/project_number works_on ) ) ⋈sin, project_number1 < project_number2 ( πsin,project_number2 ( ρsin/emp_sin,project_number2/project_number works_on ) )

In effect, this is similar to SQL as follows:

SELECT  *
FROM (  SELECT  emp_sin 
                    AS sin,
                project_number
                    AS project_number1
        FROM    works_on ) AS w1
INNER JOIN (
        SELECT  emp_sin 
                    AS sin,
                project_number
                    AS project_number2
        FROM    works_on ) AS w2 
    ON  w1.sin = w2.sin
    AND w1.project_number1 < w2.project_number2;

The little exploit in the requirements here lets us do a natural join and a final projection to finish up ( since I've already renamed the emp_sin attribute to sin ):

πemp_name, project_number1, project_number2 ( employee ⋈ ( πsin,project_number1 ( ρsin/emp_sin,project_number1/project_number works_on ) ) ⋈sin, project_number1 < project_number2 ( πsin,project_number2 ( ρsin/emp_sin,project_number2/project_number works_on ) ) )

Again, as visual proof, this is approximately the equivalent of the following SQL, shown in action in this SQL Fiddle:

SELECT  e.emp_name, w1.project_number1, w2.project_number2
FROM    dbo.employee e
INNER JOIN (
        SELECT  emp_sin 
                    AS sin,
                project_number
                    AS project_number1
        FROM    dbo.works_on ) AS w1
    ON  e.sin = w1.sin
INNER JOIN (
        SELECT  emp_sin 
                    AS sin,
                project_number
                    AS project_number2
        FROM    dbo.works_on ) AS w2 
    ON  w1.sin = w2.sin
    AND w1.project_number1 < w2.project_number2;