Notes / description string in another table in a relational database

database-design

I'm primarily a non-database developer, however I am familiar with normalization techniques.

A coworker was suggesting that it is correct for a table to store 'description' and 'notes' as a foreign key rather than in the table for performance reasons, given that some of the records will contain NULL (or empty string in this case), and presumably will be split across pages. This conversation occurred after we noticed that the integrity of the one-to-one table design was broken with a duplicate record. It seems to me that if the developer who designed the column put it in the source table, all of the code that handles the one-to-one integrity would be unnecessary (and would have not caused this bug)

Pros of putting Nvarchar in a table

  • never have the risk of duplication
  • easier to work with, easy queries

Cons of putting Nvarchar in a table

  • after insertion, if the notes field grows, might cause a page split
    and worsened performance when querying the table.

Additional question: when putting notes in a seperate table, is the design closer to 1st normal form?

related, which seem to defend putting the strings in the source table instead of a FK:
https://stackoverflow.com/questions/9540738/nchar-vs-nvarchar-performance
Are relations slower than a big, inefficient table?
https://stackoverflow.com/questions/1412040/do-i-need-a-separate-table-for-nvarcharmax-descriptions

Best Answer

A coworker was suggesting that it is correct for a table to store 'description' and 'notes' as a foreign key rather than in the table for performance reasons, given that some of the records will contain NULL (or empty string in this case), and presumably will be split across pages.

Possibly, depending on RDBMS, the exact data types, and what the performance goal is.

For SQL Server, which it sounds like you're using, you can use sp_tableoption with large value types out of row to keep a LOB (nvarchar(MAX) in this case) column in the table you want (logically), while pushing the bulk of the data storage onto separate pages (physically).

This will make accessing the LOB data slower, but speed up access to only the base data (probably moreso for scanning) as the tradeoff. Depending on the average LOB size, this setting may not make a significant difference. You will have to test this for your exact scenario to see if it improves your workload.

In any event, as long as you want to keep the one-to-one relationship, there's no reason to logically separate the values from the base table because there is an available way around the problem in SQL Server, assuming you're using the MAX type variant. (Note: if you're currently using a fixed-length field, you could switch to the MAX type and add a check constraint to limit the length.)

This conversation occurred after we noticed that the integrity of the one-to-one table design was broken with a duplicate record.

It seems like the design of these structures is smelly. Are you keeping a single table to store all of these fields, which are then referenced from the parent tables?

A better design would be to only separate the tables where there's an actual need (again, as I mentioned, this may not even be required) with a declarative one-to-one relationship, such as repeating the primary key of the parent table as the primary key in the LOB storage table.

It seems to me that if the developer who designed the column put it in the source table, all of the code that handles the one-to-one integrity would be unnecessary (and would have not caused this bug)

Quite possibly. The fewer moving parts there are in a system, the more likely it is to be reliable, probably with fewer bugs as a result of complexity.