The execution plan shown does not seem to match the big SELECT DISTINCT
query because the Sort
and Unique
steps are missing. Anyway you are correct than when retrieving ~50% of a table, index don't help. The best strategy is a big sequential scan of the main table and only fast hardware helps with that.
For the 2nd part of the question:
How would I go about selecting only the unique combinations of
adjacent columns? Is this too complicated a task to perform through a
database query? Would it speed up the query?
To remove duplicate combinations of adjacent columns, the structure of the resultset should be changed so that each output row has only one couple of adjacent columns along with their corresponding dimensions in the parallel coordinates graph. Well, except that the dimension for the 2nd column is not necessary since it's always the dimension for the other column plus one.
In one single query, this could be written like this:
WITH logs as (
SELECT log_time_mapped, syslog_priority_mapped,
operation_mapped, message_code_mapped, protocol_mapped,
source_ip_mapped, destination_ip_mapped,
source_port_mapped, destination_port_mapped,
destination_service_mapped, direction_mapped,
connections_built_mapped, connections_torn_down_mapped,
hourofday_mapped, meridiem_mapped
FROM firewall_logs_mapped
WHERE operation = 'Built')
SELECT DISTINCT 1, log_time_mapped, syslog_priority_mapped FROM logs
UNION ALL
SELECT DISTINCT 2, syslog_priority_mapped, operation_mapped FROM logs
UNION ALL
SELECT DISTINCT 3, operation_mapped, message_code_mapped FROM logs
UNION ALL
...etc...
SELECT DISTINCT 14, hourofday_mapped, meridiem_mapped FROM logs
;
The first SELECT DISTINCT
subquery extracts the lines to draw between dimensions 1 and 2, the next subquery between dimensions 2 and 3, and so on. DISTINCT
eliminates duplicates, so the client side doesn't have to do it. The UNION ALL
concatenates the results without any further processing.
However it's a heavy query and it's dubious that it would be any faster than what you're already doing.
The WITH
subquery is likely to gets slowly materialized on disk, so it might be interesting to compare the execution time with this other form repeating the same condition:
SELECT DISTINCT 1, log_time_mapped, syslog_priority_mapped
FROM firewall_logs_mapped WHERE operation = 'Built'
UNION ALL
SELECT DISTINCT 2, syslog_priority_mapped, operation_mapped
FROM firewall_logs_mapped WHERE operation = 'Built'
UNION ALL
SELECT DISTINCT 3, operation_mapped, message_code_mapped
FROM firewall_logs_mapped WHERE operation = 'Built'
...etc...
;
For a moment I thought one might be able to use a pre-existing text_pattern_ops
index with the USING INDEX
clause when adding a UNIQUE CONSTRAINT
. But that fails, because:
ERROR: index "book2_name_like" does not have default sorting behavior
Per documentation:
The index cannot have expression columns nor be a partial index. Also,
it must be a b-tree index with default sort ordering. These
restrictions ensure that the index is equivalent to one that would be
built by a regular ADD PRIMARY KEY
or ADD UNIQUE
command.
For instance, a unique index like that would allow a FK constraint referencing it, but perform terribly, because it does not support standard operators.
Per documentation:
Note that you should also create an index with the default operator
class if you want queries involving ordinary <
, <=
, >
, or >=
comparisons to use an index.
So to answer the question:
If you need a UNIQUE CONSTRAINT
(among other reasons: to reference it with a FK), your first variant with constraint and index is the only option. Additionally, the default operator class of the index created by the constraint supports more operations (like sorting in default sort order).
If you don't need any of that go with your second variant because, obviously, just one index is cheaper to maintain: just a UNIQUE text_pattern_ops
index.
Differences between index and constraint:
Alternative with COLLATE "C"
Instead of creating two indexes, there is another alternative for xxx
_pattern_ops
indexes that may be preferable. The documentation:
The difference from the default operator classes is that the values
are compared strictly character by character rather than according to
the locale-specific collation rules. This makes these operator
classes suitable for use by queries
involving pattern matching expressions (LIKE
or POSIX regular
expressions) when the database does not use the standard "C" locale.
And:
The index automatically uses the collation of the underlying column.
You can create the column without collation (using COLLATE "C"
). Then the default operator class behaves the same way as text_pattern_ops
would - plus the index can be used with all the standard operators.
CREATE TABLE book2 (
book_id serial PRIMARY KEY,
book text NOT NULL COLLATE "C" UNIQUE -- that's all!
);
Now, LIKE
can use the index:
SELECT * FROM book2 WHERE book LIKE 'foo%';
But ILIKE
still can't:
SELECT * FROM book2 WHERE book ILIKE 'foo%';
SQL Fiddle.
Consider a trigram index using the additional module pg_trgm for a more general solution.:
Best Answer
You could use a UNIQUE INDEX instead of the UNIQUE CONSTRAINT using the function coalesce to treat null as a regular value for your uniqueness:
This will enforce your uniqueness as you described.
For details between a Unique constraint and a unique index,see: https://stackoverflow.com/questions/23542794/postgres-unique-constraint-vs-index