Database Design – Storing Multi-Select Data with Single-Select Data

database-design

I've got a database with some multi-select fields. To keep it simple, let's just say those fields are MatchGender and Interests. I want users to be able to find people who match (hence Gender is a multi-select; they can choose to match against one or both genders) and interests/hobbies. Along with those fields, there are several "single-select" fields such as the user's city, state, their gender, etc…

I'm trying to figure out the best way to store this data so it's easily searchable. Do I store the multi-select data in a single field that is comma delimited? Do I store it in a separate table with a UserID? Pros/Cons of each?

By the way, this is a SQL Server back-end database that will be accessed through an ASP web interface.

Best Answer

Specific answers

First of all, here is my personal approach to your specific questions:

Do I store the multi-select data in a single field that is comma delimited?

No, I do not recommend you to proceed that way. Since you have the chance to make use of a relational database management system (RDBMS), I personally consider that it would be highly convenient and reasonable to design and implement your database in a relational fashion. I am convinced that there is no need to use a format as limited as CSV when one can take advantage of the robustness of relational tools. If you opt for the CSV method, you would be adding unnecessary complexity to your project.

  • Pros. CSV format has its use in certain contexts, e. g., supplying information interchange between applications that handle data in incompatible formats. Another feature of CSV is that it is easily modified by spreadsheet aplications such as MS Excel or LibreOffice Calc. Anyway, due to the fact that you are going to use a RDBMS, I do not think any of these facts is relevant in terms of database design or storage in a relational platform. If you come across the need to deliver some of your data in CSV format in the future, then you can export your structured data to a comma delimited file without undermining your database soundness.

  • Cons. As you may already know, by storing CSV data in a single column, you would be breaking First Normal Form. This way, having unnormalized data (nonsimple or multivalued columns), you lose many of the essential technical advantages that are provided by RDBMS which are firmly grounded in relational theory. Since many people have rightly enumerated the abundant problems that arise by storing data in CSV format in an RDBMS, I will only mention that you would be putting at risk the data integrity (which is widely considered one of the most important properties of a database), the performance of your system would be substantially diminished and you will end up re-implementing in your own application many of the capabilities that RDBMS offer natively, which have been vastly optimized over the years.

Do I store it in a separate table with a UserID?

Yes, to me that is the best approach and can be easily implemented in a relational system by means of normalization.

As I understand your specifications, MatchGender represents the gender of potential partners with which a given User is interested in getting in touch. So, the assertions a User seeks one-to-many MatchGenders and a MatchGender is sought by one-to-many Users are business rules of particular importance in this specific context.

If the above is correct, then there is a many-to-many relationship between User and MatchGender, and this kind of scenarios are very common in relational database development. I will model this aspect and offer an implementation example below, in my suggestion.

  • Pros. You can expect predictable results since your database will be built on relational principles. You can take full advantage of RDBMS capabilities, such as aggregate functions, JOINs, referential integrity, etc. You can prevent data redundancy and full table scans. Your database structure will be more flexible for future changes or extensions. You can deliver database performance that largely meets the needs of your clients. Your data retrieval will be straightforward.

  • Cons. At first glance, the relational approach may seem a bit cumbersome, since it requires the database designers and programmers to be more disciplined but, as the project grows, that discipline will pay off.

Other relevant aspects of your scenario

I want to describe some other points about my understanding of your situation. This way, I think that the statements a User pursues one-to-many Hobbies and a Hobby is pursued by one-to-many Users are important business rules as well. So, there is also a many-to-many relationship between the entities User and Hobby, and this also has to be modelled and implemented.

I assume that you will have a set of Hobbies that will be inserted into the database by some sort of system administrator. If so, a HobbyCode that uniquely identifies a given occurence of a Hobby may be defined and you can set it as the meaningful PRIMARY KEY of such table.

Modeling and implementation suggestion

Therefore, to deal with your situation, I propose the following IDEF1X data model:

Users Interests DM

And then, the next DDL can be derived and physically implemented with ease from such data model:

CREATE TABLE Gender
(
    GenderCode      CHAR(1)  NOT NULL,
    Name            CHAR(11) NOT NULL,
    CreatedDateTime DATETIME NOT NULL,
    CONSTRAINT PK_Gender PRIMARY KEY CLUSTERED 
    (
        GenderCode ASC 
    ),
    CONSTRAINT UIX_Gender_Name UNIQUE
    (
        Name
    )
)

CREATE TABLE Hobby
(
    HobbyCode       CHAR(2)  NOT NULL,
    Name            CHAR(30) NOT NULL,
    Descriptionn    CHAR(90) NOT NULL,
    CreatedDateTime DATETIME NOT NULL,
    CONSTRAINT PK_Hobby PRIMARY KEY CLUSTERED 
    ( 
        HobbyCode ASC 
    ),
    CONSTRAINT UIX_Hobby_Name UNIQUE
    (
        Name
    )
)

CREATE TABLE UserProfile
(
    UserId          INT      NOT NULL,
    FirstName       CHAR(30) NOT NULL,
    LastName        CHAR(30) NOT NULL,
    BirthDate       DATETIME NOT NULL,
    GenderCode      CHAR(1)  NOT NULL,
    CreatedDateTime DATETIME NOT NULL,
    CONSTRAINT PK_UserProfile PRIMARY KEY CLUSTERED 
    (
        UserId ASC
    ),
    CONSTRAINT UIX_UserProfile_FirstName_LastName_and_BirthDate UNIQUE
    (
        FirstName ASC,
        LastName  ASC,
        BirthDate ASC
    ),
    CONSTRAINT FK_UserProfile_Gender FOREIGN KEY 
    (
        GenderCode
    ) 
    REFERENCES Gender (GenderCode)
)

CREATE TABLE UserMatchGender
(
    UserId          INT      NOT NULL,
    GenderCode      CHAR(1)  NOT NULL,
    CreatedDateTime DATETIME NOT NULL,
    CONSTRAINT PK_UserMatchGender PRIMARY KEY CLUSTERED 
    (
        UserId     ASC,
        GenderCode ASC
    ),
    CONSTRAINT FK_UserMatchGender_UserProfile FOREIGN KEY 
    (
        UserId
    )
    REFERENCES UserProfile (UserId),
    CONSTRAINT FK_UserMatchGender_Gender FOREIGN KEY 
    (
        GenderCode
    )
    REFERENCES Gender (GenderCode)
)

CREATE TABLE UserHobby
(
    UserId          INT      NOT NULL,
    HobbyCode       CHAR(2)  NOT NULL,
    CreatedDateTime DATETIME NOT NULL,
    CONSTRAINT PK_UserHobby PRIMARY KEY CLUSTERED 
    (
        UserId    ASC,
        HobbyCode ASC
    ),
    CONSTRAINT FK_UserHobby_UserProfile FOREIGN KEY 
    (
        UserId
    )
    REFERENCES UserProfile (UserId),
    CONSTRAINT FK_UserHobby_Hobby FOREIGN KEY 
    (
        HobbyCode
    )
    REFERENCES Hobby (HobbyCode)
)