I think I have a problem where EAV is the best choice.
I am planning my tables as follows.
motherboard
motherboard_id | model_name
int | varchar
Example:
motherboard_id | model_name
----------------------------
1 | Asus Rampage
2 | Intel 3X
3 | Gigabit 22P
attribute
attribute_id | attribute_name | attribute_type
int | varchar | enum('str','int','float')
Example:
attribute_id | attribute_name| attribute_type
-------------------------------------------------
1 | form_factor | int
2 | max_ram | int
3 | ram_type | str
4 | weight | float
5 | CPU_socket | str
6 | PCH | str
motherboard_attribute_int
attribute_id | motherboard_id | value
int | int | int
Example:
attribute_id | motherboard_id | value
-----------------------------------------
1 | 1 | 1
2 | 1 | 16384
1 | 2 | 1
2 | 2 | 32,768
1 | 3 | 2
2 | 3 | 8192
motherboard_attribute_str
attribute_id | motherboard_id | value
int | int | varchar
Example:
attribute_id | motherboard_id | value
--------------------------------------------
3 | 1 | DDR3
5 | 1 | LGA2011
6 | 1 | Intel P55
3 | 2 | DDR3
5 | 2 | LGA3011
3 | 3 | DDR2
5 | 3 | Socket 2
motherboard_attribute_float
attribute_id | motherboard_id | value
int | int | float
Example:
attribute_id | motherboard_id | value
------------------------------------------
4 | 1 | 250.2
4 | 2 | 250.2
4 | 2 | 110.5
I will have to query on all attributes (use WHERE clause). For example –
Find all motherboards with weight >150g and max_ram>16M
So, I need to query the attribute table and discover that max_ram in an int and weight is a float. So, I have to join motherboard
with motherboard_attribute_int
and motherboard_attribute_float
Is there a way to do this?
Is there an alternate database design that would serve this kind of data better?
Best Answer
What you really want is a table full of nullable fields, EAV is hard to get right, and with EAV you literally need three columns, not 6. It would be much easier to just keep adding nullable columns for your needs at this time.
What are you trading off? Nothing really. Performance and scalability issues are much harder to interpret and diagnose with an EAV format over the traditional wide-table format. You really only end up with tables with lots of null values that you just need good indexes to find the things you want quickly, which is a lot easier than done with an EAV.
Running 1 EAV table per data type is just as horrid as the five+ column design you were describing before from a perf and maint standpoint. You really don't want to do this. You really want to just add a lot of nullable columns to one wide table. You're shooting yourself in the foot and asking us how to bandage it before you begin. EAV is not the design pattern you want right now, in part because you don't understand how the model works. Model your business data, then model your database.
In your case, you want a table for motherboards with vendor, board name/model number, etc. I've modeled a simple table below. EAV would then be used with this table to model things like "link for driver downloads" which are very rarely going to be needed. Think about the main table as being searching details, and the EAV being details that you only need when you show a secondary tab on the UI.
And that's it. No more EAVs. If you need to use it as a float or an int, you do that in the application.
I have some other suggestions, if you are just hell-bent on using an EAV for tracking details, such as moving all of those socket bits to an EAV, but trust me when I tell you that will become much more tedious than what I gave you. This format lets you search for the details you want (like in a picker UI) and lets you build indexes for commonly searched features once you get a very large table. The only indexes I would suggest are on the board name, the socket type and count, and the ID for lookups. Everything else I would index after putting the system under load with actual data.
The EAV model really doesn't fit this business model right now, so without seeing an actual EAV schema intention, I would recommend you approach it from the style I've outlined above, and ignore the column widths.
The approach I've outlined above is about 500 bytes per record so far, which is relatively tiny. The indexes alone would be hard to maintain.
I hope this helps you reconsider the approach to your problem design, and we as always as a community are happy to help you flesh out the details.