I am looking for advice on the best relational modelling approach to store surveys, questions, and responses.
I am looking for which of the two approaches below looks best, or an alternative approach to either.
I have at least these entities:
- question
- survey
- person
And at least these relationships:
- Each survey has 1 or more questions.
- Each question may be used in 0 or more surveys.
- Each person may take 0 or more surveys.
Here is where I run into trouble: how to model the responses to survey questions made by a person.
Here are two approaches I've considered, neither of which seems very good to me. The diagrams here are greatly simplified to illustrate the issue.
What I don't like about this approach:
- The
survey_person_question_response
table has two different columns that refer to a survey:survey_question_survey_id
andsurvey_person_survey_id
- It would be an error to have different
survey_id
's referenced in one row for these two columns. The survey_question must be from the same survey as the person took in survey_person. I can't see a good way to enforce this.
- It would be an error to have different
- It seems like what I am doing here is making a relationship between two relationships. That feels wrong to me for some reason.
Approach 2:
Try to avoid two FKs from approach 1 that should refer to the same value…
What I don't like about this approach:
- There is no enforcement that the
question_id
andsurvey_id
FKs are from a validsurvey_question
pair - There is no enforcement that the
survey_id
andperson_id
FKs are from a validsurvey_person
pair
Any advice on:
- Whether one of these approaches is a typical approach
- The pros and cons of one of these approaches over the other
- A better way to arrange this data entirely
Would be greatly appreciated!
Best Answer
As per my understanding of your specifications, your business environment involves a conceptual-level ternary relationship. In this regard, you need to define:
So, I consider that you are on the right track with your Approach 1, although it requires some small (yet important) refinements in order to make it more accurate. I will detail such refinements and other relevant considerations in the following sections.
Business rules
Let us expand the applicable business rules a bit and reformulate them in the following way:
Expository IDEF1X diagram
Then, I have created the IDEF1Xa diagram that is presented in Figure 1, which synthesizes the business rules formulated above:
a Integration Definition for Information Modeling (IDEF1X) is a highly recommendable modeling technique that was established as a standard in December 1993 by the United States National Institute of Standards and Technology (NIST). It is solidly based on theoretical work authored by the sole founder of the relational model, i.e., Dr. E. F. Codd and also on the entity-relationship view developed by Dr. P. P. Chen.
The PersonSurvey relationship
As I see it, the PersonSurvey relationship is required to provide a means of authorization so that a Person can take part in a given Survey. In this way, once a certain Person has been registered in a specific Survey, he or she is authorized to provide Responses to the Questions that integrate the respective Survey.
The SurveyQuestion relationship
I assume that the property (or attribute) called suvery_question.question_number in your diagram is used to represent the Order of presentation of a given Question instance with respect to a particular Survey. As you can see, I have denoted such property as SurveyQuestion.PresentationOrder, and I think that you should prevent that (i) two or more Question.QuestionNumber values share (ii) the same PresentationOrder value in (iii) the same SurveyQuestion occurrence.
To portray that need, I have included a composite ALTERNATE KEY (AK) in the box representing this entity type, which is comprised of the combination of properties (SurveyNumber, QuestionNumber, PresentationOrder). As you are well aware, a composite AK can be declared in a logical DDL design with the aid of a multi-column UNIQUE constraint (as I exemplified in the
SurveyQuestion
table that is part of the expository DDL layout expounded a few sections below).The Response entity type
Yes, with the Response entity type I am depicting a relationship between two other relationships; it may seem awkward at first glance but there is nothing wrong with this approach, as long as it (a) represents the features of the business context of interest accurately and (b) is represented properly in a logical-level layout.
Yes, you are totally correct, it would be an error to portray that part of the scenario at the logical level of abstraction by means of two
Response.SurveyNumber
(or, say,Response.SurveyId
) values referenced from two different columns in the sameResponse
row.Derived logical SQL-DDL layout
Two composite FOREIGN KEYs in the
Response
tableThis is, probably, the most important point to discuss: the references made from a given
Response
row toSurveyQuestion.SurveyNumber
, andSurveyPerson.SurveyNumber
must have matching values. As far as I am concerned, the best option to enforce this condition in a declarative way is by making use of two composite FOREIGN KEYs (FKs).
As shown in the DDL design, the first FK is making a reference to the
PersonSurvey
table PRIMARY KEY (PK), i.e.,(PersonId, SurveyNumber)
, and is conformed by the columnsResponse.PersonId
andResponse.SurveyNumber
.The second FK is pointing to the
SurveyQuestion
table PK, i.e.,(SurveyNumber, QuestionNumber)
, and is, accordingly, made up of the columnsResponse.SurveyNumber
andResponse.QuestionNumber
.In this way, the
Response.SurveyNumber
column is quite instrumental since it is used as part of a FK reference in two different constraints.With this method, one ensures database management system-guaranteed referential integrity from
Response
to thePersonSurvey
;Response
to theSurveyQuestion
; andPerson
,Survey
andQuestion
.Derived data to avoid update anomalies
I have noticed in your diagram two elements that I esteem are worth mentioning. These elements are related to two
PersonSurvey
columns that can (should) be derived.In that regard, you can derive the
PersonSurvey.IsStarted
datum by querying if a givenPerson
occurrence has provided one or moreResponses
toQuestions
that integrate an exactSurvey
via theSurveyQuestion
table.And you can also obtain the
PersonSurvey.IsCompleted
data point by determining if a givenPerson
instance has supplied aResponse
to all theQuestions
that cointain a value of 'TRUE' in theIsMandatory
column in a specificSurveyQuestion
row.By way of the derivation of these values, you are preventing some update anomalies that would have eventually arisen in case you had kept such values in the
SurveyQuestion
column.Important consideration
As @Dave rightly points out in his comment, if you face a future requirement demanding the management of different kinds of responses that imply managing dates, numeric values, multiple choice, and another possible aspects, you will have to extend this database layout.