Finding fuctional dependencies and candidate keys

database-designnormalizationrelational-theory

I have number of files, each containing all reviews for a specific hotel (one file corresponds to one hotel, e.g. hotel_2194.dat). Each review is of the type:

<Author>PERKEISHA
<Content> I had a great time at...
<Date>Mar 8, 2008
<No. Reader>-1
<No. Helpful>-1
<Overall>1
<Value>1
<Rooms>1
<Location>5
<Cleanliness>5
<Check in / front desk>5
<Service>5
<Business service>1

I constructed the following relation:

Review (Hotel_ID, Author, Content, Date, No. Readers, No. Helpful, Overall, Value, Rooms, Location, Cleanliness, Check in, Service, Business service)

However, I am having trouble identifying the functional dependencies and candidate keys. I was thinking of something like (Hotel_ID, Author, Content, Date) for a primary key but then I realized that the same Author may post the same Review (in terms of contents) about the same Hotel on the same Date and just change his overall mark. Am I thinking too hard about this? I'd be grateful if someone could suggest some functional dependencies and keys.

Best Answer

Content looks like a free-form text field - that generally makes it a horrible choice for a primary key column. Hotel_ID, Author, and Date are probably reasonable (especially if Date is "date of stay at hotel", rather than "date of posting of review").

I would generally expect that an updated review would replace the original. If you want to retain older version, I would include both "date of stay" and "date of posting" (and both could be part of the key); this would allow the same person to have a review for each stay (as ratings for a stay last month could remain valid, even if a review of a later stay had different ratings).

It would be likely that you would want a Hotel table, and an Author table (to hold information about those entities that wouldn't change from review to review (hotel info, author's account info (assuming some sort of login process).

NOTE: In practice, I would probably want to treat Hotel_ID, Author, and the two date fields as a functional key (must be unique), but use an auto-incrementing integer as the actual, literal primary key (saves space on indexing and in any tables that might need to reference it, easier to use in queries, etc.)