Relational Algebra – How to Count Distinct Entries in a Column

database-designrelational-algebrarelational-theoryrelationstable

So I have a table similar to this one. Each user has posted a review about one or more hotels(A,B,C,D) but on different dates so there are no duplicate tuples even though a person might have reviewed the same hotel more than once.

I need to count the number of DISTINCT hotels every user has reviewed using RELATIONAL ALGEBRA only. How can I do that?

example to show notations I use:

R = ƔUser,COUNT(Hotel_reviewed)->Num_Reviews (InitialRelation- table 1)

would give the number of reviews by each user

The result should be the following table:

example to show notations I use:

R = ƔUser,COUNT(Hotel_reviewed)->Num_Reviews (InitialRelation- table 1)

would give the number of reviews by each user

Best Answer

Besides the more compact syntax (from @McNets' answer):

select   User,
         count(distinct Hotel_Reviewed) HotelsReviewed
from     InitialRelation
group by User;

we can also do a projection first to find distinct User, Hotel_Reviewed pairs and then aggregate:

select   User,
         count(Hotel_Reviewed) as Hotels_Reviewed
from     
    ( select distinct
               User,
               Hotel_Reviewed
      from     InitialRelation
    ) as D
group by User ;

This leads us to the relational algebra notation:

R = Ɣ User, COUNT(Hotel_Reviewed) -> Hotels_Reviewed 
        (π User, Hotel_Reviewed (InitialRelation)) -> D

Related Solutions

Mysql – Peer review database design for various reports

I think you are doing well going for the 4th normal form and exploding your data. Now, as you found out, you need to de-normalise to support for historical data support: if the review will always account for a particular department's score whatever the current department of the reviewee_id is, then yes, you definitely need to add a department at the review table and make sure you know what/why it is there for.

Push the logic further: what if there is a 4th and 5th category added in a year? How do you calculate past and future averages then? On 3 or 5 categories? Then another year later, the first category gets removed? Will you delete all scores for that category, or change yet again the average's calculation based on the number of "category" at the time of the "date" of the "score"?

Sometime, too much normalisation is not the best course of action. You have a good core here. Time to custom-fit now. Take a step back, plan for the future, ask the end-users to make sure you support all their foreseeable needs, and look at the impacts of adding/removing/updating departments, employees and categories on historical reporting.

As for reporting, if you are using powerful data tools like IBM Cognos, it will be easy to generate data cubes, then let managers do their own reports. If you are using tools like Microsoft Excel with "external data", then the easiest course of action would be to pre-create Views or Stored Procedures that crunch up the data. The caveat here is that you will have to create a new View/SP for every manager that comes up with a new request/idea/variant.

Mysql – Creating Database Tables for Reviews functionality that incorporates Tags/Categories

The tables can be queried in a few different ways depending on what you want to do. If you wish to (Example 2) extract all the Reviews with a particular Category you can create a procedure like this:

SELECT review_id, professional_rating, efficiency_rating, referral_rating...
FROM Review r
INNER JOIN Review_Category rc ON r.review_id = rc.review_id
INNER JOIN Category c ON rc.category_id = c.category_id
WHERE c.review_category_name = 'foo'

For Example 1, if you want to insert a new review with 2 categories, you generally do this in 2 steps, handled by the application calling the database procedures. First insert the review with something like:

CREATE PROCEDURE Review_Insert (<review table values>, OUT NewReviewID bigint)
BEGIN 
    INSERT INTO Review (<column list>)
    VALUES (<value list>)

    SELECT NewReviewID = LAST_INSERT_ID()
END

You should then be storing the output of the NewReviewID in your application, which you then pass into the next procedure for associating the Review with the Category, something like:

CREATE PROCEDURE Review_Category_Insert (ReviewID BIGINT, CategoryID BIGINT)
BEGIN
    INSERT INTO Review_Category (review_id, category_id)
    VALUES (ReviewID, CategoryID)
END

This last one needs to be called for each Category that is associated with a Review.

You can do only one trip to the database too, if that is what you need, however it appears you could have 1 category associated with a review or 100 in one trip, which makes defining the parameters in your procedure difficult (not impossible, just difficult).

Best Answer

Related Solutions

Mysql – Peer review database design for various reports

Mysql – Creating Database Tables for Reviews functionality that incorporates Tags/Categories

Related Question