EAV Model vs Fixed Fields for Additional User Data in MySQL

database-designeavMySQLsubtypes

Say I have to store data about two different types of person roles or occupations, i.e., Consumers and Merchants, and both types of roles share a single base table called User.

In this respect, Merchants need to have a few additional fields, e.g., CompanyName, CompanyDescription.

What is the most efficient way to design a database for these specifications?

Have extra columns on the User table.
Have an extra table called MerchantData which has a UserId FOREIGN KEY.
Have a UserMeta EAV table which allows any amount of additional values.

I am currently leaning towards option 2, as the additional fields are fixed. Are there any down sides that I am not seeing?

Best Answer

Since Consumers and Merchants share some common data as Users they should be in a single base table, as you have already mentioned.

An EAV implementation can be successful under some well planned and controlled use. But I see nothing in your question that would imply that an EAV solution is the right answer.

In fact, as you move further in your development you might find that not only do Merchants have some additional attributes, but so do Consumers. If so, you might want a simple Supertype-Subtype model.

That would give you 3 tables that are related as shown below:

               USERS (contains common information)
                 |
                 |
     ____________|_____________
     |                        |
     | (details specific to   | 
     |       a subtype)       |
Consumers                Merchants

If this works for you, much of the time you would only access the Users table, but when you needed additional information for a Consumer or a Merchant you would like to those details as well.

This is a pretty simple approach and avoids unneeded duplication of data.

For a perspective on EAV implementations, you might look at Aaron Bertrand's post, which explains the advantages and the limitations.

https://sqlblog.org/2009/11/19/what-is-so-bad-about-eav-anyway

Note that in this case it was rapidly changing requirements which made the EAV decision a good choice.

REASON #1

You only have 150MiB of MyISAM tables (70MiB Data, 80 MiB Indexes)

REASON #2

InnoDB indexes tend to get very bloaty because Secondary Indexes have keys into the Clustered Index. This always results in a double index lookup. This can be overlooked with large, write-heavy datasets.

REASON #3

InnoDB tablespaces tend to get very bloaty because of MVCC being created and discarded without an automatic reclaiming of disk space:

With innodb_file_per_table disabled, the system tablespace file ibdata1 would steadily grow with 0% chance of reclaiming of disk space. You would have to convert InnoDB storage engine to use innodb_file_per_table.
With innodb_file_per_table enabled, you could go to the individual InnoDB and run of the the following to physically shrink the InnoDB table mydb.mytable:
- OPTIMIZE TABLE mydb.mytable;
- ALTER TABLE mydb.mytable ENGINE=InnoDB;

All this can be avoid with MyISAM

REASON #4

InnoDB protects individual rows by performing MVCC for transaction control. The overhead generated for reads in a day would probably be greater than 150 MiB.

I can probably name 2 or 3 more reasons, but let's cut to the chase: Is there anything that can improve performance for MyISAM in your case? Why, yes there is.

Your said the following

The table is about 260k rows in size, with 28 fields which for the most part is varchars and ints

If you have many varchars, there is something you can do to increase read/write performance. For any MyISAM table mydb.mytable: run this command:

ALTER TABLE mydb.mytable ROW_FORMAT=Fixed;

What will this do, this will treat all VARCHARs as CHARs. Every row will be the exact same length. This will increase disk space 80%-100%. In your case, let's assume it doubles your 150 MiB MyISAM table to 300 MiB. Where is the benefit? Your MyISAM table can now be read/written anywhere from 20% - 30% faster without changing anything else I learned that from pages 72,73 from MySQL Database Design and Tuning.

I have written about this in the past:

Which is faster, InnoDB or MyISAM? (May 03, 2012)
Which mysql storage engine to choose? (May 02, 2012)
Best of MyISAM and InnoDB (Sep 20, 2011)
Which DBMS is good for super-fast reads and a simple data structure? (Aug 12, 2011)
What is the performance impact of using CHAR vs VARCHAR on a fixed-size field? (May 10, 2011)

Database design with multiple tables

Making one large table would be a mistake: you'll have a table with lots of NULL values and it would soon evolve into a maintenance nightmare. And about saving the data as XML or JSON -- (slaps your wrist) that's for even considering it.

Every Resulttype should get its own table, each of these tables has a primary key over an ID column and (this is the trick!) all of these IDs are taken from the same sequence!

Then you only need a table with the columns projectid, resultid (and optionally resulttype). Since you will look for the primary key, even searching over all tables can be done fast.

Best Answer

Related Solutions

MySQL – InnoDB vs MyISAM with Many Indexes

REASON #1

REASON #2

REASON #3

REASON #4

Database design with multiple tables

Related Question