Mysql – One-to-many dictionary (lookup) table vs varchar vs enum

database-designenumMySQL

Imagine we have orders table, and an order has a status. Which of these three options is the best?

Use varchar for status column
Use enum for status column
Use separate status table, which has status_id int and name varchar, and in orders table keep status_id as a foreign key.

Which is the best approach? I suspect that using dictionary table is also better because it'd be faster to search statuses by int, instead of varchar.

Best Answer

Well, I am an advocate of ENUM -- at least in limited use.

I would use it for status with a small, reasonably static, list of possible values. I would start with unknown to catch things that are typos. ALTER TABLE has long been optimized to add a new option on the end of an ENUM list.

I would not use ENUM for continents. If there are standard abbreviations, I would use a short VARCHAR for such. For countries, I advocate

country CHAR(2) CHARACTER SET ascii

Using a lookup table can be very bad for efficiency. A "star" schema in Data Warehousing can be terribly inefficient. This happens when multiple JOINs are needed; ENUM avoids the inefficiency. VARCHAR is bulky, which is a big issue in DW applications.

An ENUM acts like a string in many situations: WHERE status = 'OK' is more readable than either of the alternatives.

Comparing two ints versus two strings: Ints is not enough faster to matter.

VARCHAR -- Use when bulkiness is not a problem
ENUM -- Use when bulkiness is a problem, and the number of choices is quite small and stable.
Dictionary lookup -- Use when multiple tables need to reference the same set of stuff, and/or when there is extra info in the Dictionary.

Related Solutions

Sql-server – Design concerns for using 3x bit or one char(1) or one integer in table for holding status of item

In my experience, trying to encode multiple data points into a single column always ends up being more trouble than it's worth. Sure, it seems cool and clever to use BITWISE operators, but there are many things that go wrong and it won't always be efficient to test those bits without cumbersome and unintuitive workarounds. It's the same reason we stay away from storing comma-separated lists, JSON strings etc. in a single column - eventually you care about viewing or filtering on those distinct bits which you now have to extract, sometimes expensively.

With the information I have, my vote is for three separate BIT columns. They will still collapse to similar storage patterns as a single column with the three bits on/off, and can be made more efficient individually and across the board in several ways, including:

data compression
sparse columns
filtered indexes (e.g. WHERE allow_returns = 1)

Someone else advocated for three CHAR(1) columns. These do not benefit from storage collapse and also require a check constraint, making them less than ideal in my mind.

Now, my answer might change if you say, "well, what if I might add 15 other attributes in the future?" I certainly don't think it's wise to build the columns this way if they're not relatively static - changing the schema (and therefore all of the code and interfaces to it) for every new or changed attribute is going to be a royal pain. So in that case you might want to consider EAV - where the attributes are not part of the metadata but part of the data. There are a lot of objections to EAV, mostly around performance and the difficulty in enforcing constraints (in this case unlikely to be an issue if all of these attributes are either on or off), but it worked quite well for us at my previous job. You might model it like this:

CREATE TABLE dbo.Attributes
(
  AttributeID TINYINT PRIMARY KEY,
  Name VARCHAR(32) NOT NULL UNIQUE
);

CREATE TABLE dbo.ItemAttributes
(
  ItemID INT NOT NULL 
    FOREIGN KEY REFERENCES dbo.Items(ItemID),
  AttributeID TINYINT NOT NULL 
    FOREIGN KEY REFERENCES dbo.Attributes(AttributeID),
  Status BIT NOT NULL,
  PRIMARY KEY(ItemID, AttributeID)
);

And again, you can have filtered indexes here to make certain queries much more efficient, such as (imagine the AttributeID for "allow returns" is 10):

CREATE INDEX optAllowReturns ON dbo.ItemAttributes(ItemID)
  WHERE AttributeID = 10 AND Status = 1;

If you have certain attributes that are not on/off (for example, three states of manufacture or shipping), you can change the Status column to:

Value TINYINT NOT NULL

This can double as an on/off value for attributes that are represented that way, and as tri- or more-state value for attributes that require more than simple on/off. You can also reflect which type is which in the metadata of the dbo.Attributes table.

Postgresql – How to implement business logic permissions in PostgreSQL (or SQL in general)

I know that you are not asking about database security per se, but you can do what you want using database security. You can even use this in a web app. If you don't want to use database security, then the schemas still apply.

You want column-level security, row-level security, and probably hierarchical role management. Role-Based security is much easier to manage than User-Based security.

This example code is for PostgreSQL 9.4, which comes out soon. You can do it with 9.3, but there is more manual labour required.

You want everything to be indexable if you are concerned with performance †, which you should be. This means that bit-mask and array fields probably won't be a good idea.

In this example, we keep the main data tables in the data schema, and corresponding views in public.

create schema data; --main data tables
create schema security; --acls, security triggers, default privileges

create table data.thing (
  thing_id int primary key,
  subject text not null, --or whatever
  owner name not null
);

Put a trigger on data.thing for inserts and updates enforcing that the owner column is the current_user. Perhaps allow only the owner to delete his own records (another trigger).

Create a WITH CHECK OPTION view, which is what users will actually use. Try really hard to make it updatable, otherwise you'll need triggers/rules, which is more work.

create view public.thing with(security_barrier) as 
select
thing_id,
subject,
owner,
from data.thing
where
pg_has_role(owner, 'member') --only owner or roles "above" him can view his rows. 
WITH CHECK OPTION;

Next, create an access-control list table:

--privileges r=read, w=write

create table security.thing_acl (
  thing_id int,
  grantee name, --the role to whom your are granting the privilege
  privilege char(1) check (privilege in ('r','w') ),

  primary key (thing_id, grantee, privilege),

  foreign key (thing_id) references data.thing(thing_id) on delete cascade
);

Change your view to account for ACLs:

drop view public.thing;

create view public.thing with(security_barrier) as 
select
thing_id,
subject,
owner
from data.thing a
where
pg_has_role(owner, 'member')
or exists (select 1 from security.thing_acl b where b.thing_id = a.thing_id and pg_has_role(grantee, 'member') and privilege='r')
with check option;

Create a default row privileges table:

create table security.default_row_privileges (
  table_name name,
  role_name name,
  privilege char(1),

  primary key (table_name, role_name, privilege)
);

Put a trigger on insert on data.thing so that it copies default row privileges to security.thing_acl .

Adjust table-level security appropriately (prevent inserts from unwanted users). No one should be able to read the data or security schemas.
Adjust column-level security appropriately (prevent some users from seeing/editing some columns). You can use has_column_privilege() to check that a user can see a column.
Probably want security definer tag on your view.
Consider adding grantor and admin_option columns to acl tables to track who granted the privilege, and whether the grantee can manage privileges on that row.
Test lots

† In this case pg_has_role is probably not indexable. You would have to get a list of all superior roles to current_user and compare to the owner/grantee value instead.

Best Answer

Related Solutions

Sql-server – Design concerns for using 3x bit or one char(1) or one integer in table for holding status of item

Postgresql – How to implement business logic permissions in PostgreSQL (or SQL in general)

Related Question