Design a db table where rows may have NULL in multiple columns

database-designnormalization

I am trying to design a database for tour packages. Tour packages contains the detailed itinerary.

Tour package duration may be between 3 and 30 days. I need to add the description of each day.
If I make 30 columns like day_1, day_2, … day_30 .
While creating the package for 5 days the rest of 25 Columns will be empty/NULL.

There will be constraints that all packages should have less than 30 days.

How can I improve this design ?

Best Answer

You normally would have your data normalized. This means you will have one table for the tour packages, and another one for each days itinerary:

CREATE TABLE tour_packages
(
    tour_package_id INTEGER PRIMARY KEY,
    tour_package_name varchar(150),
    other_data varchar(1000)
) ;

CREATE TABLE package_schedules
(
    tour_package_id INTEGER NOT NULL REFERENCES tour_packages(tour_package_id) /* ON DELETE CASCADE */,
    day INTEGER NOT NULL,  -- Day 1 of itinerary, day 2, ...
    day_description varchar(1000) NOT NULL,

    -- The rows in this table are identified by a combination of tour_package_id and day
    PRIMARY KEY (tour_package_id, day),
    -- We can have the database check that we don't have day 0, or day negative
    CHECK (day >= 1),
    -- If you want to limit the number of days:
    CHECK (day <= 30)
) ;

With the proper CONSTRAINTS defined for your tables (which include in this case the PRIMARY KEYS, the REFERENCES and CHECKS) you can be quite sure that your data follows the rules.

The only thing that an SQL database (as of now) cannot do for you is to make sure that you don't have "missing days" just by declaring constraints.

That is, the situtation produced by the following INSERTs cannot be avoided, and should be taken care of at the application level:

We have one tour package:

INSERT INTO tour_packages
    (tour_package_id, tour_package_name)
VALUES
    (1, 'My first tour') ;

We add now the schedules for days 1 and 3, but we miss day 2:

INSERT INTO package_schedules
    (tour_package_id, day, day_description)
VALUES
    (1, 1, 'First day of first tour: Welcome') ;

-- We are missing day 2

INSERT INTO package_schedules
    (tour_package_id, day, day_description)
VALUES
    (1, 3, 'Third day of first tour: Going to visit Honolulu') ;

This could actually be accomplished on the unnormalized scenario that you provided, with a large collection of CHECK constraints; where each day_n can only be null if day_n+1 is also null. That is, all the NULL values must be on the right side and be consecutive.

CHECK(day_1  IS NOT NULL)
CHECK(day_2  IS NOT NULL OR (day_2 IS NULL AND day_3 IS NULL))
...
CHECK(day_29 IS NOT NULL OR (day_29 IS NULL AND day_30 IS NULL))

I'd say that the advantage that the unnormalized design has because of this is far outweighted by all the inconvenients you may have. The day that you will move from 30 day-max packages to 90 day-max will become absolutely obvious. The day that you want to query: Is there any package where any scheduled day moves through Honolulu? you will find out as well.

The "missing day" is usually dealt with at the application layer, not the database. The database could in theory also handle this (probably) by using triggers, or with a database that allows the definition of functions that can perform complicated checks, and put them into CHECK( no_empty_days(tour_package_id)). In practice, it's something I've seldomly found.


The previous definitions are standard SQL and run on all versions available in DBFiddle.uk (as of today). Take into account, however, that MariaDB (or MySQL) allow the declaration of CHECK constraints, but then do not enforce them.

You can check everything at dbfiddle here

Related Question