I'm using PostgreSQL 9.3. I want to understand if I have an option to make a constraint unique across the entire table vs. unique across a subset of the table (i.e. by using 2 columns in the unique constraint, I restrict the uniqueness), which one is better for lookups?
Consider this table where a unique alphanumeric code is allotted to each student of the class.
CREATE TABLE sc_table (
name text NOT NULL,
code text NOT NULL,
class_id integer NOT NULL,
CONSTRAINT class_fk FOREIGN KEY (class_id) REFERENCES class (id),
CONSTRAINT sc_uniq UNIQUE (code)
);
Currently the code
is unique across the entire table. However the specification says that it is sufficient for the code to be unique across the class
only. For my design requirements there's no restriction either way.
However if I change the constraint to be unique for a given class only, how would it affect lookup by code?
Or, in other words, which of the following combination of constraint & lookup is the best speed wise:
-- 1. unique across entire table, lookup by value
CONSTRAINT sc_uniq UNIQUE (code)
SELECT * FROM sc_table WHERE code='alpha-2-beta'
-- 2. unique across entire table, lookup by value & class
CONSTRAINT sc_uniq UNIQUE (code)
SELECT * FROM sc_table WHERE class_id=1 AND code='alpha-2-beta'
-- 3. unique per class, lookup by value
CONSTRAINT sc_uniq UNIQUE (code, class_id)
SELECT * FROM sc_table WHERE code='alpha-2-beta'
-- 4. unique per class, lookup by value & class
CONSTRAINT sc_uniq UNIQUE (code, class_id)
SELECT * FROM sc_table WHERE class_id=1 AND code='alpha-2-beta'
Question: My understanding is that 2 is better than 1 & 4 is better than 3. But which one's better between 1-vs-3 & 2-vs-4?
Update : Adding output of explain analyze
. 3
is bad because there's no index for the lookup. 2
seems to be the best but the table is too small to conclude that.
-- 1
"Index Scan using sc_uniq on sc_table (cost=0.15..8.17 rows=1 width=72) (actual time=0.041..0.044 rows=1 loops=1)"
" Index Cond: (code = 'code1'::text)"
"Total runtime: 0.096 ms"
-- 2
"Index Scan using sc_uniq on sc_table (cost=0.15..8.17 rows=1 width=72) (actual time=0.024..0.026 rows=1 loops=1)"
" Index Cond: (code = 'code1'::text)"
" Filter: (class_id = 1)"
"Total runtime: 0.056 ms"
-- 3
"Bitmap Heap Scan on sc_table2 (cost=4.18..12.64 rows=4 width=72) (actual time=0.052..0.053 rows=1 loops=1)"
" Recheck Cond: (code = 'code1'::text)"
" -> Bitmap Index Scan on sc_uniq2 (cost=0.00..4.18 rows=4 width=0) (actual time=0.039..0.039 rows=1 loops=1)"
" Index Cond: (code = 'code1'::text)"
"Total runtime: 0.121 ms"
-- 4
"Index Scan using sc_uniq2 on sc_table2 (cost=0.15..8.17 rows=1 width=72) (actual time=0.036..0.039 rows=1 loops=1)"
" Index Cond: ((code = 'code1'::text) AND (class_id = 1))"
"Total runtime: 0.093 ms"
Best Answer
Your combinations in order of typical performance:
3.
is invalid. If rows are only unique per(code, class_id)
, the lookup bycode
alone can return multiple rows and is different from the rest.2.
is pointless. Ifcode
is unique, there is no point in adding another predicate onclass_id
- except to verify that a givencode
actually belongs to a givenclass_id
(and get no row otherwise).Only
1.
and4.
make sense and I would go with1.
, of course. Unless you have additional requirements for the values ofcode
, it's much more efficient to have one unique column. You could also make it the PK. Queries are simpler (one predicate instead of two), the (automatically created) unique index is potentially smaller (the most important factor here!), the lookup is typically slightly faster.UPDATEs are also potentially more expensive for
2.
, where more columns trigger index updates. AnUPDATE
changing onlycode_id
is cheaper for1.
.Your test result for
1.
is counter-intuitive, maybe an artifact of your specific setup. Maybe you didn't prewarm the cache? Or some other random factor. It's pretty obvious from theEXPLAIN
output: the only difference between 1. and 2. is the additionalFilter: (class_id = 1)
step. Nothing to gain here, you can only lose (even if very little in this case).2.
is typically a bit slower than1.
And4.
is also typically a bit slower than1.