Database design for surveys and experiments

database-design

I have only a bit of experience in working with databases, but rather theoretically during my studies and not so much practically. I was wondering if someone could help me figuring out how to design a good relational database for this data setup.

The setup is this. I have a set of predefined surveys that are used to evaluate subjects on certain aspects during some experiments. Each of these surveys is composed of a set of items. The items are combined to a set of factors, e.g. items 1, 4 and 8 build factor A, so the survey can be evaluated. The surveys are something like general templates that can be used by experimenters.

In an experiment each subject fills out the survey resulting in a value for each item of the survey. I don’t need to store the results for each subject but rather the mean across subjects for each item. Items are combined to factors in any combination, which is predefined by the survey, and for each factor a value can be calculated.

I want a database that contains all surveys and stores the information for each experiment where a survey is used, that means, the experiment should link to a particular survey and the survey should be linked to the items that compose it with their values, the factors with their values and which items compose which factors. Take into account that each survey can be used in multiple experiments.

The thing is, and this is where it gets complex, for some surveys I won't have the items but only the factors so I should be able to store the information on factors without relying on the items being there. Another thing is that usually the experimenters use the predefined items and factors of a survey. However, sometimes it happens that they use some additional items or factors, or they compose the factors differently. I want to be able to distinguish which items and factors were originally from the survey and which are added extra (or were changed).

So in summary I want to store this information:

  • Experiment
  • Survey used on the experiment
  • Original items of the survey and their respective values (if
    available)
  • Original factors from the survey, their respective values and the
    items that compose them
  • Extra items from the experiment that are not linked to the survey (if
    available)
  • Extra (or changed) factors from the experiment that are not linked to
    the survey

Take into account that one factor could be composed with original items and extra items. Any combination could happen. An extra factor could be defined with original and extra items too.

Any ideas what could be a good efficient way to define the structure of this database? I tried around already, but I always have circular references somewhere. And the result seems to be very complex.

Best Answer

You may wish to start with http://tdan.com/data-model-quality-where-good-data-begins/5286 Particulary i found following technique very useful when building relationships:

Relationship Names
Your author studied under Richard Barker and is a strong believer in the Oracle/SSADM[5] standard structure for relationship names. It provides discipline, and it permits the relationships to be read as easy-to-understand but precise English sentences. When models are being presented to the public, this is extremely valuable. Each of the two roles in a relationship should be named with this structure:

Each
<entity type 1>
must be           (Side of relationship line closest to first entity type is solid)
(or)
may be             (Side of relationship line closest to first entity type is dashed)
<relationship>
one and only one  (No crow’s foot next to second entity type)
(or)
one or more       (Crow’s foot next to second entity type)
<entity type 2)

For example, in Figure 10, the first relationship consists of the following two roles:
• Each ORDER must be from one and only one PARTY.
• Each PARTY may be a buyer in one or more ORDERS.
The second relationship consists of the following two roles:
• Each ORDER must be to one and only one PARTY.
• Each PARTY may be a seller in one or more ORDERS.
(Note that these sentences are much more graceful than something like “A PARTY buys in zero, one, or more ORDERS”. Non-data modelers don’t talk like that.)