PostgreSQL 9.3 – Performance of Single vs Multiple Column Unique Constraint

constraintpostgresqlpostgresql-9.3

I'm using PostgreSQL 9.3. I want to understand if I have an option to make a constraint unique across the entire table vs. unique across a subset of the table (i.e. by using 2 columns in the unique constraint, I restrict the uniqueness), which one is better for lookups?

Consider this table where a unique alphanumeric code is allotted to each student of the class.

CREATE TABLE sc_table (
   name text NOT NULL,
   code text NOT NULL,
   class_id integer NOT NULL,
   CONSTRAINT class_fk FOREIGN KEY (class_id) REFERENCES class (id),
   CONSTRAINT sc_uniq UNIQUE (code)
);

Currently the code is unique across the entire table. However the specification says that it is sufficient for the code to be unique across the class only. For my design requirements there's no restriction either way.

However if I change the constraint to be unique for a given class only, how would it affect lookup by code?

Or, in other words, which of the following combination of constraint & lookup is the best speed wise:

-- 1. unique across entire table, lookup by value
CONSTRAINT sc_uniq UNIQUE (code)       
SELECT * FROM sc_table WHERE code='alpha-2-beta'

-- 2. unique across entire table, lookup by value & class
CONSTRAINT sc_uniq UNIQUE (code)       
SELECT * FROM sc_table WHERE class_id=1 AND code='alpha-2-beta' 

-- 3. unique per class, lookup by value
CONSTRAINT sc_uniq UNIQUE (code, class_id)       
SELECT * FROM sc_table WHERE code='alpha-2-beta'

-- 4. unique per class, lookup by value & class
CONSTRAINT sc_uniq UNIQUE (code, class_id)       
SELECT * FROM sc_table WHERE class_id=1 AND code='alpha-2-beta'

Question: My understanding is that 2 is better than 1 & 4 is better than 3. But which one's better between 1-vs-3 & 2-vs-4?

Update : Adding output of explain analyze. 3 is bad because there's no index for the lookup. 2 seems to be the best but the table is too small to conclude that.

-- 1
"Index Scan using sc_uniq on sc_table  (cost=0.15..8.17 rows=1 width=72) (actual time=0.041..0.044 rows=1 loops=1)"
"  Index Cond: (code = 'code1'::text)"
"Total runtime: 0.096 ms"

-- 2
"Index Scan using sc_uniq on sc_table  (cost=0.15..8.17 rows=1 width=72) (actual time=0.024..0.026 rows=1 loops=1)"
"  Index Cond: (code = 'code1'::text)"
"  Filter: (class_id = 1)"
"Total runtime: 0.056 ms"

-- 3
"Bitmap Heap Scan on sc_table2  (cost=4.18..12.64 rows=4 width=72) (actual time=0.052..0.053 rows=1 loops=1)"
"  Recheck Cond: (code = 'code1'::text)"
"  ->  Bitmap Index Scan on sc_uniq2  (cost=0.00..4.18 rows=4 width=0) (actual time=0.039..0.039 rows=1 loops=1)"
"        Index Cond: (code = 'code1'::text)"
"Total runtime: 0.121 ms"

-- 4
"Index Scan using sc_uniq2 on sc_table2  (cost=0.15..8.17 rows=1 width=72) (actual time=0.036..0.039 rows=1 loops=1)"
"  Index Cond: ((code = 'code1'::text) AND (class_id = 1))"
"Total runtime: 0.093 ms"

Best Answer

Your combinations in order of typical performance:

1. > 2. > 4. ( > 3.)

3. is invalid. If rows are only unique per (code, class_id), the lookup by code alone can return multiple rows and is different from the rest.

2. is pointless. If code is unique, there is no point in adding another predicate on class_id - except to verify that a given code actually belongs to a given class_id (and get no row otherwise).

Only 1. and 4. make sense and I would go with 1., of course. Unless you have additional requirements for the values of code, it's much more efficient to have one unique column. You could also make it the PK. Queries are simpler (one predicate instead of two), the (automatically created) unique index is potentially smaller (the most important factor here!), the lookup is typically slightly faster.

UPDATEs are also potentially more expensive for 2., where more columns trigger index updates. An UPDATE changing only code_id is cheaper for 1..

Your test result for 1. is counter-intuitive, maybe an artifact of your specific setup. Maybe you didn't prewarm the cache? Or some other random factor. It's pretty obvious from the EXPLAIN output: the only difference between 1. and 2. is the additional Filter: (class_id = 1) step. Nothing to gain here, you can only lose (even if very little in this case). 2. is typically a bit slower than 1. And 4. is also typically a bit slower than 1.

Best Answer

Related Solutions

Postgresql – Optimizing ORDER BY in a full text search query

PostgreSQL Indexing – How to Index WHERE (start_date >= ‘2013-12-15’)

Related Question