Sql-server – Text Storage and Database Design Optimization in a SQL Server Database

database-designoptimizationsql server

Given a database with a number of columns (c), where some of these columns contain text that is one of multiple values due to a set CONSTRAINT on the columns, what is the most efficient way to structure this database? Also considering the number of rows (r) the database may grow to in size.

Example: given a table with three columns, one being an auto incrementing integer ID column, another being a Fruit column where acceptable fruits are 'Apple','Banana','Orange', and 'Strawberry', along with the final column being Color with 'Red', 'Orange', 'Green', and 'Yellow' as acceptable values. Let us assume the Fruit and Color column have a data type of varchar(50). Which of the following would be a good practice and suitable database design?

Should each value be directly represented in the table as follows given the previous example. Would it be fair to say that after a large number of rows querying such a table would be very inefficient due to the large storage and computation requirements required on the text?
+-----+--------+--------+ | ID | Fruit | Color | +-----+--------+--------+ | 1 | Banana | Yellow | | 2 | Apple | Green | | ... | ... | ... | | 500 | Orange | Orange | +-----+--------+--------+

OR
For each column that has one of multiple values applicable to it, should that be represented in another table with integer values, and then joined together with an INNER JOIN? Followed by an integer representation in the main table? Does this depend on the length of the text values being stored in that field? For the given example, we would then have three tables.
+----+------------+ | ID | FruitName | +----+------------+ | 1 | Apple | | 2 | Banana | | 3 | Orange | | 4 | Strawberry | +----+------------+ +----+------------+ | ID | FruitColor | +----+------------+ | 1 | Red | | 2 | Green | | 3 | Orange | | 4 | Yellow | +----+------------+

Then followed by:
+-----+-------+-------+ | ID | Fruit | Color | +-----+-------+-------+ | 1 | 2 | 4 | | 2 | 1 | 2 | | ... | ... | ... | | 500 | 3 | 3 | +-----+-------+-------+

Is there any mathematical 'rule-of-thumb' when designing a database considering the number of rows and the type of data to be stored in each column? Or should the second schema be followed in general.

Best Answer

If all apples are green and all bananas are yellow a Fruit table (ID int, Name varchar(50), colour varchar(50)) would be appropriate, with your data table having a foreign key to it.

If you have yellow apples and orange bananas (yea for genetic engineering!), but only certain combinations are permitted you will need FruitRainbow(ID int, FruitID, ColourID) with the latter two as FKs to your option 2 tables above and your data table having a FK to FruitRainbow.

If any fruit can occur in any colour, and you don't want to limit those combinations in advance, your option 2 is fine.

If your query is really about resource optimisation rather than relational integrity then you'll have to decide what you want to gain and what you're willing to trade to get it. By using integer FKs instead of natural name you get a smaller disk footprint at the cost of runtime load. There are no free lunches. Pick your problem and solve it in the knowledge of the compromises it will entail.

Related Solutions

Mysql – Database Design: What is the best structure for storing site offices and locations

Here is some vanilla SQL:

CREATE TABLE RoomTypes
(
 RoomType VARCHAR(12) NOT NULL,
 UNIQUE (RoomType)
);

CREATE TABLE Zones
(
 Zone VARCHAR(10) NOT NULL,
 UNIQUE (Zone)
);

CREATE TABLE Rooms
(
 RoomType VARCHAR(12) NOT NULL,
 Zone VARCHAR(10) NOT NULL, 
 Name VARCHAR(30) NOT NULL, 
 UNIQUE (RoomType, Zone, Name), 
 FOREIGN KEY (RoomType) REFERENCES RoomTypes (RoomType),
 FOREIGN KEY (Zone) REFERENCES Zones (Zone)
);

CREATE TABLE NumberedRooms 
(
 RoomType VARCHAR(12) NOT NULL,
 Zone VARCHAR(10) NOT NULL, 
 Name VARCHAR(30) NOT NULL, 
 CHECK (RoomType = 'Numbered'),
 UNIQUE (Name), 
 UNIQUE (RoomType, Zone, Name),
 FOREIGN KEY (RoomType, Zone, Name) 
    REFERENCES Rooms (RoomType, Zone, Name)
);

CREATE TABLE AncillaryRooms 
(
 RoomType VARCHAR(12) NOT NULL,
 Zone VARCHAR(10) NOT NULL, 
 Name VARCHAR(30) NOT NULL, 
 CHECK (RoomType = 'Ancillary'),
 UNIQUE (Zone, Name), 
 UNIQUE (RoomType, Zone, Name),
 FOREIGN KEY (RoomType, Zone, Name) 
    REFERENCES Rooms (RoomType, Zone, Name)
);

The CHECK constraints will not be tested by mySQL e.g. do the tests yourself using triggers. Consider adding other tests e.g. that attribute Name in table NumberedRooms represents an integer.

The idea that every row in the supertype table Rooms will have exactly one row in the union of AncillaryRooms and NumberedRooms. This is merely implied e.g. have 'helper' procs to add rows to both super- and subtype tables as a single operation and use triggers to ensure it is done.

Note NumberedRooms has a simple key Name alone, whereas AncillaryRooms has a compound key on (Zone, Name). All three (non-lookup) tables have a key on (RoomType, Zone, Name) throughout, allowing further subtype tables to referencing them and maintain integrity by further testing for valid RoomType values.

Product Attribute List Design Pattern in MySQL

I personally would use a model similar to the following:

The product table would be pretty basic, your main product details:

create table product
(
  part_number int, (PK)
  name varchar(10),
  price int
);
insert into product values
(1, 'product1', 50),
(2, 'product2', 95.99);

Second the attribute table to store the each of the different attributes.

create table attribute
(
  attributeid int, (PK)
  attribute_name varchar(10),
  attribute_value varchar(50)
);
insert into attribute values
(1, 'color', 'red'),
(2, 'color', 'blue'),
(3, 'material', 'chrome'),
(4, 'material', 'plastic'),
(5, 'color', 'yellow'),
(6, 'size', 'x-large');

Finally create the product_attribute table as the JOIN table between each product and its attributes associated with it.

create table product_attribute
(
  part_number int, (FK)
  attributeid int  (FK) 
);
insert into product_attribute values
(1,  1),
(1,  3),
(2,  6),
(2,  2),
(2,  6);

Depending on how you want to use the data you are looking at two joins:

select *
from product p
left join product_attribute t
  on p.part_number = t.part_number
left join attribute a
  on t.attributeid = a.attributeid;

See SQL Fiddle with Demo. This returns data in the format:

PART_NUMBER | NAME       | PRICE | ATTRIBUTEID | ATTRIBUTE_NAME | ATTRIBUTE_VALUE
___________________________________________________________________________
1           | product1   | 50    | 1           | color          | red
1           | product1   | 50    | 3           | material       | chrome
2           | product2   | 96    | 6           | size           | x-large
2           | product2   | 96    | 2           | color          | blue
2           | product2   | 96    | 6           | size           | x-large

But if you want to return the data in a PIVOT format where you have one row with all of the attributes as columns, you can use CASE statements with an aggregate:

SELECT p.part_number,
  p.name,
  p.price,
  MAX(IF(a.ATTRIBUTE_NAME = 'color', a.ATTRIBUTE_VALUE, null)) as color,
  MAX(IF(a.ATTRIBUTE_NAME = 'material', a.ATTRIBUTE_VALUE, null)) as material,
  MAX(IF(a.ATTRIBUTE_NAME = 'size', a.ATTRIBUTE_VALUE, null)) as size
from product p
left join product_attribute t
  on p.part_number = t.part_number
left join attribute a
  on t.attributeid = a.attributeid
group by p.part_number, p.name, p.price;

See SQL Fiddle with Demo. Data is returned in the format:

PART_NUMBER | NAME       | PRICE | COLOR | MATERIAL | SIZE
_________________________________________________________________
1           | product1   | 50    | red   | chrome   | null
2           | product2   | 96    | blue  | null     | x-large

As you case see the data might be in a better format for you, but if you have an unknown number of attributes, it will easily become untenable due to hard-coding attribute names, so in MySQL you can use prepared statements to create dynamic pivots. Your code would be as follows (See SQL Fiddle With Demo):

SET @sql = NULL;
SELECT
  GROUP_CONCAT(DISTINCT
    CONCAT(
      'MAX(IF(a.attribute_name = ''',
      attribute_name,
      ''', a.attribute_value, NULL)) AS ',
      attribute_name
    )
  ) INTO @sql
FROM attribute;

SET @sql = CONCAT('SELECT p.part_number
                    , p.name
                    , ', @sql, ' 
                   from product p
                   left join product_attribute t
                     on p.part_number = t.part_number
                   left join attribute a
                     on t.attributeid = a.attributeid
                   GROUP BY p.part_number
                    , p.name');

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

This generates the same result as the second version with no need to hard-code anything. While there are many ways to model this I think this database design is the most flexible.

Best Answer

Related Solutions

Mysql – Database Design: What is the best structure for storing site offices and locations

Product Attribute List Design Pattern in MySQL

Related Question