Mysql – Normalize repeated indefinite ongoing dates for an event

database-designMySQLnormalization

I am designing a calendar database that has a lot of booking and billing information. The database itself consists of form records, which are editable, and only when an event is confirmed by staff it appears on the web calendar. The web calendar does not have a separate table, it merely queries the database for recent confirmed events.

Originally the submission form was used by both staff and public and went into one giant table. I've split the booking form data into the following tables in order to normalize it:

`form_control`
    form_id
    default_event_date_id     (usually same as primary)
    primary_event_date_id     (foreign key event_dates)
    primary_event_host_id     (foreign key event_speakers)
    overall_start_date_time
    overall_end_date_time   
    interval_if_ongoing       (e.g. "1:MW," = first mon/wed of every month)
    is_confirmed              (bool)
    visibility                (bool) 
    billing_info etc.

`event_dates`
    event_id
    event_form_id       (one-to-one or many-to-one form relation)
    event-specific info (description etc.)

`event_speakers`
    person_id
    event_date_id           (many-to-one event relation)
    name, contact info etc. (specific to form submitted)

`ongoing_dates`
    date_id
    form_control_id       (check if confirmed & overall visible, get default event_id)
    custom_event_date_id  (if date is the primary, sole, or custom event date; else NULL)
    start_date_time
    end_date_time
    is_happening          (NULL = reserve 0 = dark/unused 1 = scheduled)

form_control: contains data that pertains to the entire event request or series.

  • "default" start date, end date; these limit number of events and describe a default event time.
  • ongoing interval (if end date > start date): this is my main problem (see below).
  • "default" calendar description.
  • visibility options; determines if and when the event appears on web calendar.
  • confirmation status; determines if the event appears on the calendar at all.
  • rates and billing options; these are currently series-specific and pertain only to the master form. To make them date-specific (not currently desired) I'd have to separate them into another table, but these data are logically isolated from the rest of the form.

(Not sure if worth it to handle edge cases, but I assume joins are not required to simply rebuild each form, if the following tables are strictly dependent. I.e. if rates were separated out, I would already know the form_id for a given billing_info or event_date and simply SELECT for it, correct?)

event_dates: describe calendar info pertinent to each event date.

About 80% of events only have one date. Per the above: Should the "default" (i.e. primary) event_id for each series be placed in form_control? Should it be a boolean (is_primary, is_default) in event_dates? Or should it be a field in event_dates that is UNIQUE foreign key (use_primary = form_id), default NULL? (to mark it as the default, or primary record) I want to structure it so that the php can't accidentally cause data integrity issues by setting non-unique default child record ids (primary date, default billing) for each form.

event_speakers: speaker info unique to a given date or form (as submitted).

The idea here is that old info from the date-of would be preserved even if the canonical info for a given speaker in the contacts db is updated. the form could AJAX query the contacts db to allow read-only hinting if a speaker appears multiple times, in which case the info listed in the form would be overwritten with the then-existing contact info for the speaker and a reference to the unique key.

event_speakers would be many-to-one relational to each event date. If a speaker is overall host, then they would be listed as default for the entire form, but I am not sure where best to put this, as a flag or foreign key (same question as above.)

Like primary event date, primary host info would be unique to each form, one-to-one, but would be in the same format / table as all other event_dates or speakers respectively. Perhaps I should separate them into a cloned table for primaries only? The problem with that is, the client may wish to allow a different speaker to be made primary, or allow a different event in a series to be the primary date without having to INSERT/DELETE anything.

OK, so here's my main problem:

ongoing_dates: list all actual dates in which the event occurs, queried by the web calendar.

I currently have event_dates as listing custom event info only, i.e. dates with something different going on than the primary event_date — different lineup, different description, or different hours. (Different lineup would simply be a reference to different event_date_id in event_speakers).

However, this leaves me with ongoing_dates that simply repeat at intervals (some ad infinitum). I can put the repeat-interval (or date-mask, e.g. mTW) in form_control, but this only helps to set dates, not retrieve them.

And I realized I needed to track dates that are not in the calendar, such as "every second wednesday EXCEPT July 1st. ALSO INCLUDING July 20th."

I figure the best way of handling this is to use the info in form_control to set or generate dates but then write a record in ongoing_dates upon-submit for every custom or blackout date, (otherwise scheduled dates for which it is reserved, or dark), with default = reserved. (reserved would be treated as a suggestion if the event is not yet confirmed, i.e. pending).

That way, only primary, default, or custom dates (dates for which there is any data other than "it's happening again") would have their own event_dates record created. The rest would go in ongoing_dates.its_happening to be simply queried by the web calendar. Possible values for its_happening would be e.g. "it's happening again", "it's not happening this particular interval", or… "it may be happening", which can be represented as a simple TRUE-FALSE-NULL.

ongoing_dates would also have a column for: event_start and end_time (using form_control start and end time as default, allowing custom times without creating a record) and, importantly, an optional event_dates foreign key. (i.e. every event_dates record would have a 1-1 relationship with ongoing_dates, but 99% of ongoing dates would not have a separate event_dates record, unless one is created by simply adding the ID.) This way the calendar can add and delete ongoing_dates at will without losing any information (e.g. if the exact same event is made once a month instead of twice a month)

My question is, does this approach make sense from a normalization standpoint?

Should I separate time and visibility info entirely into ongoing_dates, so that each event_dates record refers to a date that is only listed in ongoing_dates? Or must every date in the calendar have its own full record (with mostly-default information) and the calendar queries only that? Or does it not make sense to have a separate (small) record for potentially indefinite repeating dates?

Those seem to to be my three options. I don't want the database to fill up with extraneous info simply because one event happens every day of the week.

Best Answer

To focus on ongoing and future dates...

  • A table with suitable columns for simple cases of repeated actions. One row per repeating event. No specific dates except start and end.
  • A table for exceptions (extra dates, skipped dates, etc). This has a column with a specific date. This table could also handle simple, one-time, events.
  • Application code that reads those tables, and computes whether or not one of those events occurs on a particular day.

One important point here is: Do not try to do all the work in SQL; leave the messy stuff for the 'real' programming language in your application. The database is just a "repository of information" and a "source of truth". That is, store the raw data there (and only there).

Once you have made the split between what can and should be done in the database versus what can and should be done in the app, you may find the need to make minor tweaks in the boundary between them.

I don't think we can make all the decisions before starting. Instead, I like to plan to have a mid-course-correction.