Should I normalize and transpose rows to columns or just create columns

database-design

I sort of know the answer to this but I want to see if I am over engineering this. Please note I've done many things like this in the past so although not an expert I am pretty good with sql server (not brent ozar good but pretty good).

In any event a client has come up to us with this funky spreadsheet that he wants converted using an application. The system consists of a a list of items where certain columns in the excel file are filled out by different people.

There is a design dilemma where one of the programmers has an idea of how to do it and the other one has a different idea. First let me give some background on what the excel file has which will help you understand the problem:

enter image description here

The above you see some information from column A to F. Columns A-C are not really editable, once they are in the system they are considered good. The person who enters that information does not know the answers to columns D,E,F.

In fact column D only someone in purchasing can answer, column E only our R&M engineer can answer, and column F only our Mechanical engineer can answer. In the old days (excel days) the person who generated this file would simply send this excel file to purchasing they would enter column D and send it back. Then the person who created this would send it to our R&M engineer to enter column E and he too would fill it out and send it back..and so on same thing with the mechanical engineer.

Fast forward today, the person who created this file decided he hates this process (And I can see why) and wants this automated a bit and inside of a new application. So one of the design decisions I made is to have a table that stores columns A, B, and C with a primary key.

LineItemMaster
ID
Ident
Manufacturer
Description

I wanted to separate the other columns as actual rows in another table (I'll explain more of this later). So example records for the above table.

LineItemMasterID Ident Manufacturer  Description
1                 1021  Acme Co       12 mm CAP
2                 1022  Acme Co       10 mm CAP
3                 1023  Foo Co        Socket
4                 1024  Foo Co        Ball Screw

For simplicity please ignore that the manufacturer is repeating and the fact that the ID could simply be the Ident number to have uniqueness, ignore these issues for now. With this table I would then introduce the various column types that need to be filled in as a FK, as simple as TypeID, Type for a Types table.

TypeID Type
1      Price
2      MTBF
3      Tag

I would then introduce another table that would point the line items from the LineItemMaster with the types and allow for a value entry. I know this can get scary because this is starting to look like a Key Value table and those could get ugly, but allow me to continue, so we'd have this:

LineItemMasterValue table

ID  LineItemMasterID  TypeID   Value
1          1            1       12.00
2          1            2       NULL
3          1            3       NULL
4          2            1       NULL
5          2            2       100

In this case line item master 1 (which is the 12mm CAP) has three records each for price, mtbf, tag. Pretty straight forward. This would allow me to query the items correctly for that person and have them update their specific values. This would also ensure that logging, such as who modified the record, is correct at the record level, etc.

Now the other option (option 2) is to simply have a table that looks just like our excel file where we have

Ident
Manufacturer
Description
Price
MTBF
Tag

All in one table and my colleague insists that we would just allow them to update their respective columns only. But in this case our code would have to pass parameters or come up with dynamic sql to only select specific columns from this dataset. Not to mention the UX would become much more complicated. It also suffers from if we grow horizontally and there is a new column added next to F (Tag) this would be a design change.

I pointed out yes that this second method is simpler in terms of its a pretty flat structure vs we would have to transpose row data (the line item master types table) into columns (but hey that's part of the user experience, at least the database is more normalized).

I also pointed out that the ModifiedBy field and ModifiedOn field would have the last person who made their edit on that row in the second proposed solution. Sure we could create a log table and any time a change is made log it in a seperate table. But in this case if we took proposal one the edited by field would be right because it would be stored in the line item master type table of who made the edit.

So I know both solutions could work…I just keep favoring proposal one just because of my database experience. We could make it work with proposal two but I continue to feel it is a hack. I didn't want to give much more details as I am giving a simple case, but imagine this has to reference separate other relationships such as "Tasks". If we went with proposal one we'd have one row that has an ID and I'd link that to the task, but that same task could happen thrice (R&M, Pricing, Tag #). The second solution because there are three separate records for the three different people I could attach each task individually and it would make more sense

What can you guys share or think?

Best Answer

This is largely opinion based as I'm sure a lot of design questions are. As you stated, it can be made to work in both scenarios.

From a growth perspective, I definitely agree with your choice to supply the values in a vertical table with identifiers for each value type. It's a more normalized approach and won't require much dev work if/when more fields are added later. That's really the main benefit. As business needs change(they always do), how much work is it going to be to change or add. A normalized approach is almost always better for changing business requirements.

The other option is simple to be sure. Writing reports against data like that would be very simple. That's really the only advantage that comes to mind. Unfortunately it's not a good one. This is most definitely an OLTP application we are talking about. So we should be focused on user input, not reporting. As long as the data makes sense, reporting shouldn't be that hard anyways.

One other point. We have an application onsite that was designed by a business user with no actual developer input.
It was designed as a single table with no primary key and about 100 columns. Everything is stored on that one table. Normalization WHO CARES!!! Now that the business is tired of the shortcomings of the old system and wants something better, there is almost nothing we can do. The old data can not be directly translated to the new system because the requirements are sooooo different. All of that data has become almost entirely useless.

Anyways, the point of my story is, don't take these decisions lightly. They tend to come back and bite you.

To add a bit of ammo to your point, try asking the business the following questions, Will we ever want to record who filled in values or when they were updated? Will we ever want to record multiple users answers for each field? If the answer of either of these is yes, even if it's a remote possibility, the normalized approach is better.