Mysql – Best practices, MySQL table with 150+ columns, many null

database-designMySQLperformancequery-performance

I have a table storing property data for thousands of properties, all coming from a feed. (updating at least once a day).

I'm certainly not an expert with databases and would appreciate some guidance on the best way to approach building the property table. The challenge is that each property has a LOT of attributes which might or might not be included. Each attribute value is potentially unique, so a relational table doesn't seem like it would provide any benefit.

Currently, my plan for the table is simple, creating a wide table with a lot of potentially NULL columns. eg:

id - int(not null)
date - datetime(not null)
attribute1 - varchar(null)
attribute2 - varchar(null)
attribute3 - int(null)
attribute4 - bool(null)
ect..

Is there a better way to set this up? Each attribute is uniquely associated with the property, so keeping it all in a single table make sense to me.

Will having that many columns, even if most are NULL, cause me issues when thousands of records exist in that table? Each day, I need to do select several select queries on this table, returning hundreds to thousands of records each time.

Any advice or direction on what to research is greatly appreciated!

Best Answer

Consider an Entity-Attribute-Value design. The general concept is that you put all of the data in a very long, narrow table which may take the form of:

CREATE TABLE dbo.PropertyAttributes
(
PropertyID   INT NOT NULL
             REFERENCES dbo.Properties(PropertyID),
AttributeID  INT NOT NULL
             REFERENCES dbo.Attributes(AttributeID),
StringValue  NVARCHAR(1024),
NumericValue DECIMAL(16,4),
DateValue    SMALLDATETIME,
ModifiedDate SMALLDATETIME NOT NULL
             DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (PropertyID, AttributeID)
);

Then you store metadata about the attributes. At a minimum you'd need something like this:

CREATE TABLE dbo.Attributes
(
AttributeID INT PRIMARY KEY,
PropertyID  INT NOT NULL 
            REFERENCES dbo.Properties(PropertyID),
[Name]      NVARCHAR(32) NOT NULL UNIQUE,
DataTypeID  TINYINT NOT NULL -- 1 = string, 2 = numeric, 3 = date
);

Taking this approach offers some benefits because when new attributes show up in the data, you don't have to modify the database structure to house the data and you don't have to construct new queries to get at the new columns you created.

Like other aspects of life, these benefits come with tradeoffs. Creating a very wide view requires you to Pivot the table which can be an expensive query to run.

Aaron Bertrand, who posts here quite a bit, described his experience with this approach in 2009. It's still worth a read today.

MDCCL's suggestion to review attributes and potentially decompose the most relevant ones is a good one, move the most commonly used attributes into the base Properties table so you pivot the EAV table less frequently.