Database Design – How to Do Conditional JOIN or SELECT Based on Table Data

database-designeavmariadb

I think I have a problem where EAV is the best choice.

I am planning my tables as follows.

motherboard

motherboard_id | model_name
int            | varchar

Example:

motherboard_id | model_name
----------------------------
1              | Asus Rampage
2              | Intel 3X
3              | Gigabit 22P

attribute

attribute_id  | attribute_name | attribute_type
int           | varchar        | enum('str','int','float')

Example:

attribute_id   | attribute_name| attribute_type
-------------------------------------------------
1              | form_factor   | int
2              | max_ram       | int
3              | ram_type      | str
4              | weight        | float
5              | CPU_socket    | str
6              | PCH           | str

motherboard_attribute_int

attribute_id  | motherboard_id | value
int           | int            | int

Example:

attribute_id   | motherboard_id | value
-----------------------------------------
1              | 1              | 1
2              | 1              | 16384
1              | 2              | 1
2              | 2              | 32,768
1              | 3              | 2
2              | 3              | 8192

motherboard_attribute_str

attribute_id  | motherboard_id | value
int           | int            | varchar

Example:

attribute_id   | motherboard_id | value
--------------------------------------------
3              | 1              | DDR3
5              | 1              | LGA2011
6              | 1              | Intel P55
3              | 2              | DDR3
5              | 2              | LGA3011
3              | 3              | DDR2
5              | 3              | Socket 2

motherboard_attribute_float

attribute_id  | motherboard_id | value
int           | int            | float

Example:

attribute_id   | motherboard_id | value
------------------------------------------
4              | 1              | 250.2
4              | 2              | 250.2
4              | 2              | 110.5

I will have to query on all attributes (use WHERE clause). For example –

Find all motherboards with weight >150g and max_ram>16M

So, I need to query the attribute table and discover that max_ram in an int and weight is a float. So, I have to join motherboard with motherboard_attribute_int and motherboard_attribute_float

Is there a way to do this?

Is there an alternate database design that would serve this kind of data better?

Best Answer

What you really want is a table full of nullable fields, EAV is hard to get right, and with EAV you literally need three columns, not 6. It would be much easier to just keep adding nullable columns for your needs at this time.

So there should not be a performance and scalability issue having a table with 20 or 30 columns? What am I trading off when I make a 30 column table?

What are you trading off? Nothing really. Performance and scalability issues are much harder to interpret and diagnose with an EAV format over the traditional wide-table format. You really only end up with tables with lots of null values that you just need good indexes to find the things you want quickly, which is a lot easier than done with an EAV.

Running 1 EAV table per data type is just as horrid as the five+ column design you were describing before from a perf and maint standpoint. You really don't want to do this. You really want to just add a lot of nullable columns to one wide table. You're shooting yourself in the foot and asking us how to bandage it before you begin. EAV is not the design pattern you want right now, in part because you don't understand how the model works. Model your business data, then model your database.

In your case, you want a table for motherboards with vendor, board name/model number, etc. I've modeled a simple table below. EAV would then be used with this table to model things like "link for driver downloads" which are very rarely going to be needed. Think about the main table as being searching details, and the EAV being details that you only need when you show a secondary tab on the UI.

create table motherboards (
   id bigint not null auto_increment
 , vendor_name varchar(100) not null -- would also be good as a lookup column to another table, making this an int or bigint instead
 , board_name varchar(100) not null
 , revision_number varchar(100) null
 , cpu_socket_type not null -- can be an int lookup to another table or a text field, I leave this to you based on the schema you want, but either is acceptable, as is an int with an in-app enum lookup
 , cpu_socket_count int not null default 1
 , ram_socket_type not null -- again, can be one of so many values, you decide the best type of storage
 , ram_socket_count tinyint not null default 1 -- here's where things get tricky, it's possible on some systems (enterprise grade) to have different RAM sockets for different reasons, but we skip that because that's a lot of detail. Wait till you have to cross that bridge, mkay?
 , northbridge_chipset varchar(100) not null -- this makes the most sense to me, but see the cpu_socket_type comment. There are a handful of common ones, so text may be easier to deal with. 
 , southbridge_chipset varchar(100) null -- see above

-- This is now the additional details for others
 , rear_port_usb tinyint not null default 2
 , usb_headers tinyint not null default 0
 , upc_number varchar(20) null
 , pcie_socket_count tinyint not null default 0
 , pcie_x2_socket_count tinyint not null default 0
 , pcie_x4_socket_count tinyint not null default 0
 , pcie_x8_socket_count tinyint not null default 0
 , pcie_x16_socket_count tinyint not null default 0
 , sata_headers_count tinyint not null default 0
 , pci_socket_count tinyint not null default 0
 , isa_socket_count tinyint not null default 0
 , rear_port_matrix -- some set of fields to indicate what is on the back panel where people can have access. I would probably either make up a standard "dictionary" of these and do it like an enum/lookup table, or list one per port. Where do you put parallel ports on this lookup list? How many USB are exposed via the backplate? How many USB are exposed on headers on the motherboard?
)

create table motherboards_details (
   id bigint not null (fk back to motherboards)
 , name varchar(100) not null
 , value varchar(100) not null
 , create_date datetime not null default CURRENT_TIMESTAMP -- always good to have for later tracking
)

And that's it. No more EAVs. If you need to use it as a float or an int, you do that in the application.

I have some other suggestions, if you are just hell-bent on using an EAV for tracking details, such as moving all of those socket bits to an EAV, but trust me when I tell you that will become much more tedious than what I gave you. This format lets you search for the details you want (like in a picker UI) and lets you build indexes for commonly searched features once you get a very large table. The only indexes I would suggest are on the board name, the socket type and count, and the ID for lookups. Everything else I would index after putting the system under load with actual data.

The EAV model really doesn't fit this business model right now, so without seeing an actual EAV schema intention, I would recommend you approach it from the style I've outlined above, and ignore the column widths.

The approach I've outlined above is about 500 bytes per record so far, which is relatively tiny. The indexes alone would be hard to maintain.

I hope this helps you reconsider the approach to your problem design, and we as always as a community are happy to help you flesh out the details.