Sql-server – Most efficient table storage for determining recurrence

MySQLscheduled-taskssql server

What is the most optimal table structure to store schedules / recurrence definitions so that we can efficiently query it for all events that will occur within a date range (min:1 max:32 days)?

My problem is currently MySQL but comments on other RDBMS will also help, since algorithms are mostly transferrable.

Consider the schema: Event -|--has one or many--<- Schedules

Schedule is a record defining a recurrence, e.g.

Monthly, every month Monthly, first Wednesday of every month
Monthly, Monday and Tuesday of every 3rd month from start date
Monthly, Monday and Tuesday of every May, June and July
Weekly, every Tuesday through Saturday every week
Weekly, Monday every 3rd week from start date

My table will have about 10,000 records at most.

Best Answer

For optimal store and later search, I would consider using two tables.

The first table would hold the event/schedule definition (frequency, internal, recurrences etc) - holding 1 line per event The second table will be maintained by a process running on the first table and calculating->adding the data for future occurrences of each event.

Based on your system requirements you can decide how far ahead the second table should be filled.

This will make it easy to later query a certain events running within a specific time frame.

This solution should be more elegant than trying to calculate the date range on-the-fly, which is more complex and definitely slower in performance.

EPILOGUE

Given your initial idea, you could change the station_arr_id and station_dep_id into a single route_id and store the routes in a route table

CREATE TABLE TrainRoutes
(
    station_id INT NOT NULL AUTO_INCREMENT,
    station_name VARCHAR(128) NOT NULL,

    ...
    PRIMARY KEY (station_id),
    KEY station_name_ndx (station_name)
);
CREATE TABLE TrainEvents
(
    id INT NOT NULL AUTO_INCREMENT,
    route_id INT NOT NULL,
    arr_dt DATETIME NOT NULL,
    dep_dt DATETIME NOT NULL,

    ...
    PRIMARY KEY (id),
    KEY arr_index (route_id,arr_dt),
    KEY dep_index (route_id,dep_dt)
);

You could also make the event record arrivals and departures separately.

CREATE TABLE TrainRoutes
(
    station_id INT NOT NULL AUTO_INCREMENT,
    station_name VARCHAR(128) NOT NULL,

    ...
    PRIMARY KEY (station_id),
    KEY station_name_ndx (station_name)
);
CREATE TABLE TrainEvents
(
    id INT NOT NULL AUTO_INCREMENT,
    route_id INT NOT NULL,
    event_dt DATETIME NOT NULL,
    event_type TINYINT NOT NULL # 1 for arrival, 2 for departure

    ...
    PRIMARY KEY (id),
    KEY event_index1 (route_id,event_dt,event_type),
    KEY event_index2 (route_id,event_type,event_dt),
);

I am just giving ideas. I'll leave it to you to implement. Perhaps, to give you some UI ideas, see the site I use to commute : http://as0.mta.info/mnr/schedules/sched_form.cfm . The result of the route you pick in that site shows route departures within a 5-hour window (2.5 hours before and 2.5 hours after).

Give it a Try !!!

Sql-server – Database/Storage engine suggestion for a project involving heavy inserts

Let's break this question up into a few parts.

Q: I need to insert 1mm rows a day. Is that a lot?

Not really. 1mm divided by 24 hours divided by 60 minutes divided by 60 seconds gives you about 12 inserts per second. For a rough frame of perspective, it's not unusual to see 1,000 inserts per second in typical commodity servers with no tuning.

Granted, your load won't be perfectly averaged out like that - you'll have bursts of load - but I wouldn't make database platform decisions based on less than 10k-20k inserts per second. Any platform out there will work fairly well.

Q: How should I structure the data?

Zoom out - don't think table, think databases. If you're going to be keeping this data permanently, and it's truly insert-only with no updates, then you probably want to start a new database for time lengths. Your inserts may only go into one table in one database, but every year, create a new database (MyApp_2015) and seal the old 2014 data as read-only. You can stop backing it up (as long as you've still got a good backup once), stop doing index maintenance, statistics updates, etc.

The PHP will only ever have to know about the current database for inserts, making your design a lot easier. The archival process becomes a DBA task much later down the road as long as you go in knowing that there will be more than one database involved.

If you were doing more than 1,000 inserts per second sustained, and you wanted easier performance management, then I'd also suggest building sharding into the initial design regardless of the database platform. Don't get me wrong, any modern database can handle over 1,000 inserts per second, but designing sharding in now just gives you more flexibility later on. At 12 inserts per second, it's just not worth the design/testing hassle.

Q: How should I do reporting?

In an ideal world, reports would not be done against the live server. Run the reports against a restored or replicated copy of the database. This does two things: it reduces load on the live server, and it validates your backups, guaranteeing that you've got your valuable data elsewhere.

Best Answer

Related Solutions

MySQL schema for recurring events – Query by specific date

EPILOGUE

Give it a Try !!!

Sql-server – Database/Storage engine suggestion for a project involving heavy inserts

Related Question