It might be worth posting the table definition from your other question for clarity.
The composite index is doing a few things for you:
- As you know, enforcing uniqueness on (sensor_id, timestamp); I'm unsure whether this is an important data integrity constraint.
- Allowing queries that filter on both columns to look up matching rows by using a single index. MySQL can answer some queries (equality conditions on multiple columns are the ones I know about) by merging two indexes, but this tends to be significantly slower compared to using a single composite index.
- The index can also be used to search for values in a left-based subset of the composite index, but not a right-based subset. So in this case it could help a query that filters on sensor_id values or sensor_id and timestamp values, but not timestamp values alone.
There are a number of caveats to this, so it's good idea to look at the EXPLAIN
output for your queries and verify what indexes they're using. Keep also in mind that indexes can support the read part of UPDATE and DELETE queries, as well as JOINs, GROUP BY, ORDER BY, and other operations I'm neglecting.
An example of a scenario where the composite index is unnecessary would be if you don't care about the uniqueness constraint and all your queries filter on timestamp or sensor_id, but not both.
The single-column index on sensor_id
is actually redundant since the composite index on (sensor_id
, timestamp
) can be used by the same queries, but still you might find that some queries perform faster when doing scans on the single-column index compared to using a composite index with a wider key. The difference might not be enough to matter, though, and some testing will probably be required to find out.
In addition to looking at the EXPLAIN
output for your queries, tools such as pt-index-usage
from the Percona toolkit or the table INFORMATION_SCHEMA.INDEX_STATISTICS
if you're running Percona Server or MariaDB can help you assess what indexes are actually being used.
References:
The Optimization That (Often) Isn’t: Index Merge Intersection
Practical MySQL indexing guidelines
The execution plan shown does not seem to match the big SELECT DISTINCT
query because the Sort
and Unique
steps are missing. Anyway you are correct than when retrieving ~50% of a table, index don't help. The best strategy is a big sequential scan of the main table and only fast hardware helps with that.
For the 2nd part of the question:
How would I go about selecting only the unique combinations of
adjacent columns? Is this too complicated a task to perform through a
database query? Would it speed up the query?
To remove duplicate combinations of adjacent columns, the structure of the resultset should be changed so that each output row has only one couple of adjacent columns along with their corresponding dimensions in the parallel coordinates graph. Well, except that the dimension for the 2nd column is not necessary since it's always the dimension for the other column plus one.
In one single query, this could be written like this:
WITH logs as (
SELECT log_time_mapped, syslog_priority_mapped,
operation_mapped, message_code_mapped, protocol_mapped,
source_ip_mapped, destination_ip_mapped,
source_port_mapped, destination_port_mapped,
destination_service_mapped, direction_mapped,
connections_built_mapped, connections_torn_down_mapped,
hourofday_mapped, meridiem_mapped
FROM firewall_logs_mapped
WHERE operation = 'Built')
SELECT DISTINCT 1, log_time_mapped, syslog_priority_mapped FROM logs
UNION ALL
SELECT DISTINCT 2, syslog_priority_mapped, operation_mapped FROM logs
UNION ALL
SELECT DISTINCT 3, operation_mapped, message_code_mapped FROM logs
UNION ALL
...etc...
SELECT DISTINCT 14, hourofday_mapped, meridiem_mapped FROM logs
;
The first SELECT DISTINCT
subquery extracts the lines to draw between dimensions 1 and 2, the next subquery between dimensions 2 and 3, and so on. DISTINCT
eliminates duplicates, so the client side doesn't have to do it. The UNION ALL
concatenates the results without any further processing.
However it's a heavy query and it's dubious that it would be any faster than what you're already doing.
The WITH
subquery is likely to gets slowly materialized on disk, so it might be interesting to compare the execution time with this other form repeating the same condition:
SELECT DISTINCT 1, log_time_mapped, syslog_priority_mapped
FROM firewall_logs_mapped WHERE operation = 'Built'
UNION ALL
SELECT DISTINCT 2, syslog_priority_mapped, operation_mapped
FROM firewall_logs_mapped WHERE operation = 'Built'
UNION ALL
SELECT DISTINCT 3, operation_mapped, message_code_mapped
FROM firewall_logs_mapped WHERE operation = 'Built'
...etc...
;
Best Answer
You need to use
pg_index
and limit to those that haveindisunique
. To get the actual index and table names, you need to join that table to pg_class and pg_namespace: