I have only a bit of experience in working with databases, but rather theoretically during my studies and not so much practically. I was wondering if someone could help me figuring out how to design a good relational database for this data setup.
The setup is this. I have a set of predefined surveys that are used to evaluate subjects on certain aspects during some experiments. Each of these surveys is composed of a set of items. The items are combined to a set of factors, e.g. items 1, 4 and 8 build factor A, so the survey can be evaluated. The surveys are something like general templates that can be used by experimenters.
In an experiment each subject fills out the survey resulting in a value for each item of the survey. I don’t need to store the results for each subject but rather the mean across subjects for each item. Items are combined to factors in any combination, which is predefined by the survey, and for each factor a value can be calculated.
I want a database that contains all surveys and stores the information for each experiment where a survey is used, that means, the experiment should link to a particular survey and the survey should be linked to the items that compose it with their values, the factors with their values and which items compose which factors. Take into account that each survey can be used in multiple experiments.
The thing is, and this is where it gets complex, for some surveys I won't have the items but only the factors so I should be able to store the information on factors without relying on the items being there. Another thing is that usually the experimenters use the predefined items and factors of a survey. However, sometimes it happens that they use some additional items or factors, or they compose the factors differently. I want to be able to distinguish which items and factors were originally from the survey and which are added extra (or were changed).
So in summary I want to store this information:
- Experiment
- Survey used on the experiment
- Original items of the survey and their respective values (if
available) - Original factors from the survey, their respective values and the
items that compose them - Extra items from the experiment that are not linked to the survey (if
available) - Extra (or changed) factors from the experiment that are not linked to
the survey
Take into account that one factor could be composed with original items and extra items. Any combination could happen. An extra factor could be defined with original and extra items too.
Any ideas what could be a good efficient way to define the structure of this database? I tried around already, but I always have circular references somewhere. And the result seems to be very complex.
Best Answer
You may wish to start with http://tdan.com/data-model-quality-where-good-data-begins/5286 Particulary i found following technique very useful when building relationships: