I have number of files, each containing all reviews for a specific hotel (one file corresponds to one hotel, e.g. hotel_2194.dat). Each review is of the type:
<Author>PERKEISHA
<Content> I had a great time at...
<Date>Mar 8, 2008
<No. Reader>-1
<No. Helpful>-1
<Overall>1
<Value>1
<Rooms>1
<Location>5
<Cleanliness>5
<Check in / front desk>5
<Service>5
<Business service>1
I constructed the following relation:
Review (Hotel_ID, Author, Content, Date, No. Readers, No. Helpful, Overall, Value, Rooms, Location, Cleanliness, Check in, Service, Business service)
However, I am having trouble identifying the functional dependencies and candidate keys. I was thinking of something like (Hotel_ID, Author, Content, Date)
for a primary key but then I realized that the same Author
may post the same Review
(in terms of contents) about the same Hotel
on the same Date
and just change his overall mark. Am I thinking too hard about this? I'd be grateful if someone could suggest some functional dependencies and keys.
Best Answer
Content
looks like a free-form text field - that generally makes it a horrible choice for a primary key column.Hotel_ID
,Author
, andDate
are probably reasonable (especially ifDate
is "date of stay at hotel", rather than "date of posting of review").I would generally expect that an updated review would replace the original. If you want to retain older version, I would include both "date of stay" and "date of posting" (and both could be part of the key); this would allow the same person to have a review for each stay (as ratings for a stay last month could remain valid, even if a review of a later stay had different ratings).
It would be likely that you would want a Hotel table, and an Author table (to hold information about those entities that wouldn't change from review to review (hotel info, author's account info (assuming some sort of login process).
NOTE: In practice, I would probably want to treat
Hotel_ID
,Author
, and the two date fields as a functional key (must be unique), but use an auto-incrementing integer as the actual, literal primary key (saves space on indexing and in any tables that might need to reference it, easier to use in queries, etc.)