Database Design – How to Do Conditional JOIN or SELECT Based on Table Data

database-designeavmariadb

I think I have a problem where EAV is the best choice.

I am planning my tables as follows.

motherboard

motherboard_id | model_name
int            | varchar

Example:

motherboard_id | model_name
----------------------------
1              | Asus Rampage
2              | Intel 3X
3              | Gigabit 22P

attribute

attribute_id  | attribute_name | attribute_type
int           | varchar        | enum('str','int','float')

Example:

attribute_id   | attribute_name| attribute_type
-------------------------------------------------
1              | form_factor   | int
2              | max_ram       | int
3              | ram_type      | str
4              | weight        | float
5              | CPU_socket    | str
6              | PCH           | str

motherboard_attribute_int

attribute_id  | motherboard_id | value
int           | int            | int

Example:

attribute_id   | motherboard_id | value
-----------------------------------------
1              | 1              | 1
2              | 1              | 16384
1              | 2              | 1
2              | 2              | 32,768
1              | 3              | 2
2              | 3              | 8192

motherboard_attribute_str

attribute_id  | motherboard_id | value
int           | int            | varchar

Example:

attribute_id   | motherboard_id | value
--------------------------------------------
3              | 1              | DDR3
5              | 1              | LGA2011
6              | 1              | Intel P55
3              | 2              | DDR3
5              | 2              | LGA3011
3              | 3              | DDR2
5              | 3              | Socket 2

motherboard_attribute_float

attribute_id  | motherboard_id | value
int           | int            | float

Example:

attribute_id   | motherboard_id | value
------------------------------------------
4              | 1              | 250.2
4              | 2              | 250.2
4              | 2              | 110.5

I will have to query on all attributes (use WHERE clause). For example –

Find all motherboards with weight >150g and max_ram>16M

So, I need to query the attribute table and discover that max_ram in an int and weight is a float. So, I have to join motherboard with motherboard_attribute_int and motherboard_attribute_float

Is there a way to do this?

Is there an alternate database design that would serve this kind of data better?

Best Answer

What you really want is a table full of nullable fields, EAV is hard to get right, and with EAV you literally need three columns, not 6. It would be much easier to just keep adding nullable columns for your needs at this time.

So there should not be a performance and scalability issue having a table with 20 or 30 columns? What am I trading off when I make a 30 column table?

What are you trading off? Nothing really. Performance and scalability issues are much harder to interpret and diagnose with an EAV format over the traditional wide-table format. You really only end up with tables with lots of null values that you just need good indexes to find the things you want quickly, which is a lot easier than done with an EAV.

Running 1 EAV table per data type is just as horrid as the five+ column design you were describing before from a perf and maint standpoint. You really don't want to do this. You really want to just add a lot of nullable columns to one wide table. You're shooting yourself in the foot and asking us how to bandage it before you begin. EAV is not the design pattern you want right now, in part because you don't understand how the model works. Model your business data, then model your database.

In your case, you want a table for motherboards with vendor, board name/model number, etc. I've modeled a simple table below. EAV would then be used with this table to model things like "link for driver downloads" which are very rarely going to be needed. Think about the main table as being searching details, and the EAV being details that you only need when you show a secondary tab on the UI.

create table motherboards (
   id bigint not null auto_increment
 , vendor_name varchar(100) not null -- would also be good as a lookup column to another table, making this an int or bigint instead
 , board_name varchar(100) not null
 , revision_number varchar(100) null
 , cpu_socket_type not null -- can be an int lookup to another table or a text field, I leave this to you based on the schema you want, but either is acceptable, as is an int with an in-app enum lookup
 , cpu_socket_count int not null default 1
 , ram_socket_type not null -- again, can be one of so many values, you decide the best type of storage
 , ram_socket_count tinyint not null default 1 -- here's where things get tricky, it's possible on some systems (enterprise grade) to have different RAM sockets for different reasons, but we skip that because that's a lot of detail. Wait till you have to cross that bridge, mkay?
 , northbridge_chipset varchar(100) not null -- this makes the most sense to me, but see the cpu_socket_type comment. There are a handful of common ones, so text may be easier to deal with. 
 , southbridge_chipset varchar(100) null -- see above

-- This is now the additional details for others
 , rear_port_usb tinyint not null default 2
 , usb_headers tinyint not null default 0
 , upc_number varchar(20) null
 , pcie_socket_count tinyint not null default 0
 , pcie_x2_socket_count tinyint not null default 0
 , pcie_x4_socket_count tinyint not null default 0
 , pcie_x8_socket_count tinyint not null default 0
 , pcie_x16_socket_count tinyint not null default 0
 , sata_headers_count tinyint not null default 0
 , pci_socket_count tinyint not null default 0
 , isa_socket_count tinyint not null default 0
 , rear_port_matrix -- some set of fields to indicate what is on the back panel where people can have access. I would probably either make up a standard "dictionary" of these and do it like an enum/lookup table, or list one per port. Where do you put parallel ports on this lookup list? How many USB are exposed via the backplate? How many USB are exposed on headers on the motherboard?
)

create table motherboards_details (
   id bigint not null (fk back to motherboards)
 , name varchar(100) not null
 , value varchar(100) not null
 , create_date datetime not null default CURRENT_TIMESTAMP -- always good to have for later tracking
)

And that's it. No more EAVs. If you need to use it as a float or an int, you do that in the application.

I have some other suggestions, if you are just hell-bent on using an EAV for tracking details, such as moving all of those socket bits to an EAV, but trust me when I tell you that will become much more tedious than what I gave you. This format lets you search for the details you want (like in a picker UI) and lets you build indexes for commonly searched features once you get a very large table. The only indexes I would suggest are on the board name, the socket type and count, and the ID for lookups. Everything else I would index after putting the system under load with actual data.

The EAV model really doesn't fit this business model right now, so without seeing an actual EAV schema intention, I would recommend you approach it from the style I've outlined above, and ignore the column widths.

The approach I've outlined above is about 500 bytes per record so far, which is relatively tiny. The indexes alone would be hard to maintain.

I hope this helps you reconsider the approach to your problem design, and we as always as a community are happy to help you flesh out the details.

Related Solutions

Sql-server – Foreign key constraint based on conditional column

How can I set up a foreign key constraint that will know which table to be linked to?

Either you can't, or you shouldn't. :)

The standard way to structure this is to have a parent Entity table, and a table-per-child for each of University, Program, and Course. The Entity table optionally contains a denormalized column that specifies the type of entity -- this has a bunch of advantages, so I recommend it. Also, at least in the development stages, I would recommend creating triggers to validate that rows in this structure end up in the right place (i.e., for a given EntityID, there should be only 1 matching row in all the child tables).

Then all you do is create the foreign key to the primary key of Entity.

What I've described looks like this (SQL Server syntax):

CREATE TABLE dbo.EntityType
(
    EntityTypeID tinyint NOT NULL
        PRIMARY KEY
);

CREATE TABLE dbo.Entity
(
    EntityID int NOT NULL IDENTITY
        PRIMARY KEY,
    EntityTypeID tinyint NOT NULL
        FOREIGN KEY REFERENCES dbo.EntityType(EntityTypeID)
);

CREATE TABLE dbo.University
(
    EntityID int NOT NULL
        PRIMARY KEY
        FOREIGN KEY REFERENCES dbo.Entity(EntityID)
);

CREATE TABLE dbo.LearningContext
(
    LearningContextID int NOT NULL IDENTITY
        PRIMARY KEY,
    EntityID int NOT NULL
        FOREIGN KEY REFERENCES dbo.Entity(EntityID)
);

This is the equivalent of subclassing in a database, if you're familiar with client-side programming constructs.

Note how this structure easily allows adding a new derived Entity subtype without changing the existing schema objects.

Database Design – How to Handle Conditional Fields

Your approach is reasonable. Put a check constraint on the ASSET table to ensure exactly one of your foreign keys is ever set at any point in time.

The flag is redundant since the same information is captured by which of the foreign keys is NULL. Keeping it just adds maintenance overhead to the application. Eventually it will become inconsistent and then you'll have a mess to sort out.

An alternative design is to combine some of the information from SHOP and JOB into a LOCATION table. The schema then becomes

ASSET(ASSET_ID, LOCATION_ID)
LOCATION(LOCATION_ID, NAME)
JOB(JOB_ID, LOCATION_ID)
SHOP(SHOP_ID, LOCATION_ID)

Put one row in LOCATION for each "job site or .. warehouse or vendor facility," as you say in your question.

motherboard

Example:

attribute

Example:

motherboard_attribute_int

Example:

motherboard_attribute_str

Example:

motherboard_attribute_float

Example:

Best Answer

Related Solutions

Sql-server – Foreign key constraint based on conditional column

Database Design – How to Handle Conditional Fields

Related Question