Mysql – Row for each date, or 31 columns

database-designMySQLnormalization

Currently I have a system whose front-end looks like this:

The following query:

SELECT      `l`.*, `s`.`month`, `s`.`year`
FROM        `summarylines` `l`
JOIN        `summaries` `s`
ON          `l`.`sumid` = `s`.`id`
ORDER BY    `l`.`sumid` ASC

Produces the following result:

I have a requirement to change the system to handle individual dates, rather than weeks. So on the front-end the columns would be each date, e.g. 2nd Jan, 6th Jan, 10th Jan, 12th Jan, 18th Jan, 20th Jan, 26th Jan, 27th Jan.

It is of course a major overhaul of both the back-end and the DB structure.

My question is: should I use a database with 31 columns for each possible day of the month? Any unused days would be NULL for that field (as it is now for any unused weeks). The positive in this approach is that I only need 1 row per item per month. E.g.

+--------+-------+--------+------+-----+------+------+-----+----+-----+------+-----+
| lineid | sumid | itemid |  1   |  2  |  3   |  4   |  5  | 6  |  7  |  8   | ... |
+--------+-------+--------+------+-----+------+------+-----+----+-----+------+-----+
|   1195 | 15123 |    165 | NULL |  66 | NULL | NULL |  72 | 88 |  95 | NULL | ... |
|   1196 | 15123 |    223 | NULL | 101 | NULL | NULL | 141 | 85 | 110 | NULL | ... |
+--------+-------+--------+------+-----+------+------+-----+----+-----+------+-----+

Or should I have a narrow table, with lots of rows? E.g.

+--------+-------+--------+-----+-----+
| lineid | sumid | itemid | day | qty |
+--------+-------+--------+-----+-----+
|   1195 | 15123 |    165 | 2   | 66  |
|   1196 | 15123 |    165 | 5   | 72  |
|   1197 | 15123 |    165 | 6   | 88  |
|   1198 | 15123 |    165 | 7   | 95  |
|   1199 | 15123 |    165 | ... | ... |
|   1200 | 15123 |    223 | 2   | 101 |
|   1201 | 15123 |    223 | 5   | 141 |
|   1202 | 15123 |    223 | 6   | 85  |
|   1203 | 15123 |    223 | 7   | 110 |
|   1204 | 15123 |    223 | ... | ... |
+--------+-------+--------+-----+-----+

A possible negative of this is that the back-end's SQL is currently coded in such a way that some of the existing code could just be altered if using a 31-col table. It would have to be completely rewritten using this approach.

FYI, most customers only have 1 or 2 deliveries per week, but most have 5-10 different items.

Best Answer

It depends on what you are going to do with the data, as it always does. It's usually dangerous to assume that the data is never going to be used in some other way. Your question illustrates this nicely. The original design of "summaries" assumed that weekly summarization would always be the intended use, and now you want daily summarization.

The structure of the summaries table (or view) is what can be called crosstabulation. Crosstabulation is frequently used in spreadsheets as a convenient way of displaying data that has been summarized along two dimensions. See the pivot table feature for more details.

Crosstabulated data is not in first normal form. The folks who invented first normal form were thinking about something when they invented it. They were thinking about keyed access to all data. The power of keyed access to all data is central to the simplicity and power of the relational data model. You can look up more detail under first normal form.

Based on this, I would say that one row per line item is the best structure for this data, assuming that no finer granularity can be obtained. If you want to summarize by day, or by week, or by month or whatever, you can summarize by using a pivot operation in a view.

This will take some computer resources, to be sure. Computer resources are usually cheap, compared to "we can't do what you want because the data isn't organized that way".

As far as existing code goes, just create views that make the data look the way it needs to look for the existing code, and tweak the existing code to use the view. Unless the functionality of the existing code needs to be changed as well.

Edit based on comments:

You probably know more about cross tabulation than you think you do. Look at the diagram you provided at the top of your question. That's a cross tabulation. It tabulates by product down the left side, and by week number across the top. The cells contain some kind of aggregate statistic, like the total count of bath sheets sold in week 2.

It's probably a summary of invoice data, or something like that.