Mysql: Any way to properly index 3 ENUM columns with the same options ? (A OR B OR C)

enumindexMySQL

I have 3 enum() columns with the same option values inside.
I first tried to use the "set" datatype which originally was meant to do that (hold multiple values from a set) but it seems that datatype isn't managed for 15+ years and isn't even supporting an index.
Is there a nice way to index those 3 columns so I can use them in searched without destroying query performance ?

SELECT * FROM TABLE WHERE a='x' OR b='x' OR c='x'

I thought about creating a virtual field which uses a boolean logic (&) on the 3 enum field-numbers and combines them into a large number but that's quite a hack and not nice to maintain.

Has someone solved this sort of task elegantly ?
(I do not want to use a support table and JOIN it, I want to stay with a single table)

Best Answer

Because you need to use OR logic in your WHERE predicate, there's not a way to create one index that covers every case unfortunately.

Fortunately you're basically on the money with a virtual field or rather you should create a generated column, and then you can create an index on that single column so your queries are performant.

Answer

Since you refer to the website use-the-index-luke.com, consider the chapter:

Use The Index, Luke › The Where Clause › Searching For Ranges › Greater, Less and BETWEEN

It has an example that matches your situation perfectly (two-column index, one is tested for equality, the other for range), explains (with more of those nice index graphics) why @ypercube's advice is accurate and sums it up:

Rule of thumb: index for equality first — then for ranges.

Also good for just one column?

What to do for queries on just one column seems to be clear. More details and benchmarks concerning that under these related question:

Less selective column first?

Apart from that, what if you have only equality conditions for both columns?

It doesn't matter. Put the column first that is more likely to receive conditions of its own, which actually matters.

Consider this demo, or reproduce it yourself. I create a simple table of two columns with 100k rows. One with very few, the other one with lots of distinct values:

CREATE TEMP TABLE t AS
SELECT (random() * 10000)::int AS lots
     , (random() * 4)::int     AS few
FROM generate_series (1, 100000);

DELETE FROM t WHERE random() > 0.9;  -- create some dead tuples, more "real-life"

ANALYZE t;

SELECT count(distinct lots)   -- 9999
     , count(distinct few)    --    5
FROM   t;

Query:

SELECT *
FROM   t
WHERE  lots = 2345
AND    few = 2;

EXPLAIN ANALYZE output (Best of 10 to exclude caching effects):

Seq Scan on t  (cost=0.00..5840.84 rows=2 width=8)
               (actual time=5.646..15.535 rows=2 loops=1)
  Filter: ((lots = 2345) AND (few = 2))
  Buffers: local hit=443
Total runtime: 15.557 ms

Add index, retest:

CREATE INDEX t_lf_idx ON t(lots, few);

Index Scan using t_lf_idx on t  (cost=0.00..3.76 rows=2 width=8)
                                (actual time=0.008..0.011 rows=2 loops=1)
  Index Cond: ((lots = 2345) AND (few = 2))
  Buffers: local hit=4
Total runtime: 0.027 ms

Add other index, retest:

DROP INDEX t_lf_idx;
CREATE INDEX t_fl_idx  ON t(few, lots);

Index Scan using t_fl_idx on t  (cost=0.00..3.74 rows=2 width=8)
                                (actual time=0.007..0.011 rows=2 loops=1)
  Index Cond: ((few = 2) AND (lots = 2345))
  Buffers: local hit=4
Total runtime: 0.027 ms

Postgresql – How to implement business logic permissions in PostgreSQL (or SQL in general)

I know that you are not asking about database security per se, but you can do what you want using database security. You can even use this in a web app. If you don't want to use database security, then the schemas still apply.

You want column-level security, row-level security, and probably hierarchical role management. Role-Based security is much easier to manage than User-Based security.

This example code is for PostgreSQL 9.4, which comes out soon. You can do it with 9.3, but there is more manual labour required.

You want everything to be indexable if you are concerned with performance †, which you should be. This means that bit-mask and array fields probably won't be a good idea.

In this example, we keep the main data tables in the data schema, and corresponding views in public.

create schema data; --main data tables
create schema security; --acls, security triggers, default privileges

create table data.thing (
  thing_id int primary key,
  subject text not null, --or whatever
  owner name not null
);

Put a trigger on data.thing for inserts and updates enforcing that the owner column is the current_user. Perhaps allow only the owner to delete his own records (another trigger).

Create a WITH CHECK OPTION view, which is what users will actually use. Try really hard to make it updatable, otherwise you'll need triggers/rules, which is more work.

create view public.thing with(security_barrier) as 
select
thing_id,
subject,
owner,
from data.thing
where
pg_has_role(owner, 'member') --only owner or roles "above" him can view his rows. 
WITH CHECK OPTION;

Next, create an access-control list table:

--privileges r=read, w=write

create table security.thing_acl (
  thing_id int,
  grantee name, --the role to whom your are granting the privilege
  privilege char(1) check (privilege in ('r','w') ),

  primary key (thing_id, grantee, privilege),

  foreign key (thing_id) references data.thing(thing_id) on delete cascade
);

Change your view to account for ACLs:

drop view public.thing;

create view public.thing with(security_barrier) as 
select
thing_id,
subject,
owner
from data.thing a
where
pg_has_role(owner, 'member')
or exists (select 1 from security.thing_acl b where b.thing_id = a.thing_id and pg_has_role(grantee, 'member') and privilege='r')
with check option;

Create a default row privileges table:

create table security.default_row_privileges (
  table_name name,
  role_name name,
  privilege char(1),

  primary key (table_name, role_name, privilege)
);

Put a trigger on insert on data.thing so that it copies default row privileges to security.thing_acl .

Adjust table-level security appropriately (prevent inserts from unwanted users). No one should be able to read the data or security schemas.
Adjust column-level security appropriately (prevent some users from seeing/editing some columns). You can use has_column_privilege() to check that a user can see a column.
Probably want security definer tag on your view.
Consider adding grantor and admin_option columns to acl tables to track who granted the privilege, and whether the grantee can manage privileges on that row.
Test lots

† In this case pg_has_role is probably not indexable. You would have to get a list of all superior roles to current_user and compare to the owner/grantee value instead.

Best Answer

Related Solutions

PostgreSQL Index Tuning – Multicolumn Index and Performance

Answer

Also good for just one column?

Less selective column first?

Postgresql – How to implement business logic permissions in PostgreSQL (or SQL in general)

Related Question