In my experience, trying to encode multiple data points into a single column always ends up being more trouble than it's worth. Sure, it seems cool and clever to use BITWISE operators, but there are many things that go wrong and it won't always be efficient to test those bits without cumbersome and unintuitive workarounds. It's the same reason we stay away from storing comma-separated lists, JSON strings etc. in a single column - eventually you care about viewing or filtering on those distinct bits which you now have to extract, sometimes expensively.
With the information I have, my vote is for three separate BIT
columns. They will still collapse to similar storage patterns as a single column with the three bits on/off, and can be made more efficient individually and across the board in several ways, including:
- data compression
- sparse columns
- filtered indexes (e.g.
WHERE allow_returns = 1
)
Someone else advocated for three CHAR(1)
columns. These do not benefit from storage collapse and also require a check constraint, making them less than ideal in my mind.
Now, my answer might change if you say, "well, what if I might add 15 other attributes in the future?" I certainly don't think it's wise to build the columns this way if they're not relatively static - changing the schema (and therefore all of the code and interfaces to it) for every new or changed attribute is going to be a royal pain. So in that case you might want to consider EAV - where the attributes are not part of the metadata but part of the data. There are a lot of objections to EAV, mostly around performance and the difficulty in enforcing constraints (in this case unlikely to be an issue if all of these attributes are either on or off), but it worked quite well for us at my previous job. You might model it like this:
CREATE TABLE dbo.Attributes
(
AttributeID TINYINT PRIMARY KEY,
Name VARCHAR(32) NOT NULL UNIQUE
);
CREATE TABLE dbo.ItemAttributes
(
ItemID INT NOT NULL
FOREIGN KEY REFERENCES dbo.Items(ItemID),
AttributeID TINYINT NOT NULL
FOREIGN KEY REFERENCES dbo.Attributes(AttributeID),
Status BIT NOT NULL,
PRIMARY KEY(ItemID, AttributeID)
);
And again, you can have filtered indexes here to make certain queries much more efficient, such as (imagine the AttributeID for "allow returns" is 10):
CREATE INDEX optAllowReturns ON dbo.ItemAttributes(ItemID)
WHERE AttributeID = 10 AND Status = 1;
If you have certain attributes that are not on/off (for example, three states of manufacture or shipping), you can change the Status
column to:
Value TINYINT NOT NULL
This can double as an on/off value for attributes that are represented that way, and as tri- or more-state value for attributes that require more than simple on/off. You can also reflect which type is which in the metadata of the dbo.Attributes
table.
I know that you are not asking about database security per se, but you can do what you want using database security. You can even use this in a web app. If you don't want to use database security, then the schemas still apply.
You want column-level security, row-level security, and probably hierarchical role management. Role-Based security is much easier to manage than User-Based security.
This example code is for PostgreSQL 9.4, which comes out soon. You can do it with 9.3, but there is more manual labour required.
You want everything to be indexable if you are concerned with performance †, which you should be. This means that bit-mask and array fields probably won't be a good idea.
In this example, we keep the main data tables in the data
schema, and corresponding views in public
.
create schema data; --main data tables
create schema security; --acls, security triggers, default privileges
create table data.thing (
thing_id int primary key,
subject text not null, --or whatever
owner name not null
);
Put a trigger on data.thing for inserts and updates enforcing that the owner column is the current_user. Perhaps allow only the owner to delete his own records (another trigger).
Create a WITH CHECK OPTION
view, which is what users will actually use. Try really hard to make it updatable, otherwise you'll need triggers/rules, which is more work.
create view public.thing with(security_barrier) as
select
thing_id,
subject,
owner,
from data.thing
where
pg_has_role(owner, 'member') --only owner or roles "above" him can view his rows.
WITH CHECK OPTION;
Next, create an access-control list table:
--privileges r=read, w=write
create table security.thing_acl (
thing_id int,
grantee name, --the role to whom your are granting the privilege
privilege char(1) check (privilege in ('r','w') ),
primary key (thing_id, grantee, privilege),
foreign key (thing_id) references data.thing(thing_id) on delete cascade
);
Change your view to account for ACLs:
drop view public.thing;
create view public.thing with(security_barrier) as
select
thing_id,
subject,
owner
from data.thing a
where
pg_has_role(owner, 'member')
or exists (select 1 from security.thing_acl b where b.thing_id = a.thing_id and pg_has_role(grantee, 'member') and privilege='r')
with check option;
Create a default row privileges table:
create table security.default_row_privileges (
table_name name,
role_name name,
privilege char(1),
primary key (table_name, role_name, privilege)
);
Put a trigger on insert on data.thing so that it copies default row privileges to security.thing_acl .
- Adjust table-level security appropriately (prevent inserts from unwanted users). No one should be able to read the data or security schemas.
- Adjust column-level security appropriately (prevent some users from seeing/editing some columns). You can use has_column_privilege() to check that a user can see a column.
- Probably want security definer tag on your view.
- Consider adding
grantor
and admin_option
columns to acl tables to track who granted the privilege, and whether the grantee can manage privileges on that row.
- Test lots
† In this case pg_has_role is probably not indexable. You would have to get a list of all superior roles to current_user and compare to the owner/grantee value instead.
Best Answer
Well, I am an advocate of ENUM -- at least in limited use.
I would use it for
status
with a small, reasonably static, list of possible values. I would start withunknown
to catch things that are typos.ALTER TABLE
has long been optimized to add a new option on the end of anENUM
list.I would not use
ENUM
for continents. If there are standard abbreviations, I would use a shortVARCHAR
for such. For countries, I advocateUsing a lookup table can be very bad for efficiency. A "star" schema in Data Warehousing can be terribly inefficient. This happens when multiple
JOINs
are needed;ENUM
avoids the inefficiency.VARCHAR
is bulky, which is a big issue in DW applications.An
ENUM
acts like a string in many situations:WHERE status = 'OK'
is more readable than either of the alternatives.Comparing two ints versus two strings: Ints is not enough faster to matter.
VARCHAR
-- Use when bulkiness is not a problemENUM
-- Use when bulkiness is a problem, and the number of choices is quite small and stable.