I have a table storing property data for thousands of properties, all coming from a feed. (updating at least once a day).
I'm certainly not an expert with databases and would appreciate some guidance on the best way to approach building the property table. The challenge is that each property has a LOT of attributes which might or might not be included. Each attribute value is potentially unique, so a relational table doesn't seem like it would provide any benefit.
Currently, my plan for the table is simple, creating a wide table with a lot of potentially NULL columns. eg:
id - int(not null)
date - datetime(not null)
attribute1 - varchar(null)
attribute2 - varchar(null)
attribute3 - int(null)
attribute4 - bool(null)
ect..
Is there a better way to set this up? Each attribute is uniquely associated with the property, so keeping it all in a single table make sense to me.
Will having that many columns, even if most are NULL, cause me issues when thousands of records exist in that table? Each day, I need to do select several select queries on this table, returning hundreds to thousands of records each time.
Any advice or direction on what to research is greatly appreciated!
Best Answer
Consider an
Entity-Attribute-Value
design. The general concept is that you put all of the data in a very long, narrow table which may take the form of:Then you store metadata about the attributes. At a minimum you'd need something like this:
Taking this approach offers some benefits because when new attributes show up in the data, you don't have to modify the database structure to house the data and you don't have to construct new queries to get at the new columns you created.
Like other aspects of life, these benefits come with tradeoffs. Creating a very wide view requires you to
Pivot
the table which can be an expensive query to run.Aaron Bertrand, who posts here quite a bit, described his experience with this approach in 2009. It's still worth a read today.
MDCCL's suggestion to review attributes and potentially decompose the most relevant ones is a good one, move the most commonly used attributes into the base Properties table so you pivot the EAV table less frequently.