The execution plan shown does not seem to match the big SELECT DISTINCT
query because the Sort
and Unique
steps are missing. Anyway you are correct than when retrieving ~50% of a table, index don't help. The best strategy is a big sequential scan of the main table and only fast hardware helps with that.
For the 2nd part of the question:
How would I go about selecting only the unique combinations of
adjacent columns? Is this too complicated a task to perform through a
database query? Would it speed up the query?
To remove duplicate combinations of adjacent columns, the structure of the resultset should be changed so that each output row has only one couple of adjacent columns along with their corresponding dimensions in the parallel coordinates graph. Well, except that the dimension for the 2nd column is not necessary since it's always the dimension for the other column plus one.
In one single query, this could be written like this:
WITH logs as (
SELECT log_time_mapped, syslog_priority_mapped,
operation_mapped, message_code_mapped, protocol_mapped,
source_ip_mapped, destination_ip_mapped,
source_port_mapped, destination_port_mapped,
destination_service_mapped, direction_mapped,
connections_built_mapped, connections_torn_down_mapped,
hourofday_mapped, meridiem_mapped
FROM firewall_logs_mapped
WHERE operation = 'Built')
SELECT DISTINCT 1, log_time_mapped, syslog_priority_mapped FROM logs
UNION ALL
SELECT DISTINCT 2, syslog_priority_mapped, operation_mapped FROM logs
UNION ALL
SELECT DISTINCT 3, operation_mapped, message_code_mapped FROM logs
UNION ALL
...etc...
SELECT DISTINCT 14, hourofday_mapped, meridiem_mapped FROM logs
;
The first SELECT DISTINCT
subquery extracts the lines to draw between dimensions 1 and 2, the next subquery between dimensions 2 and 3, and so on. DISTINCT
eliminates duplicates, so the client side doesn't have to do it. The UNION ALL
concatenates the results without any further processing.
However it's a heavy query and it's dubious that it would be any faster than what you're already doing.
The WITH
subquery is likely to gets slowly materialized on disk, so it might be interesting to compare the execution time with this other form repeating the same condition:
SELECT DISTINCT 1, log_time_mapped, syslog_priority_mapped
FROM firewall_logs_mapped WHERE operation = 'Built'
UNION ALL
SELECT DISTINCT 2, syslog_priority_mapped, operation_mapped
FROM firewall_logs_mapped WHERE operation = 'Built'
UNION ALL
SELECT DISTINCT 3, operation_mapped, message_code_mapped
FROM firewall_logs_mapped WHERE operation = 'Built'
...etc...
;
Using SQL Server, because I do not have Access installed, hopefully this is generic enough to be useful to you:
Sample tables and data
CREATE TABLE SupplierCountry
(
SupplierName varchar(50) NOT NULL,
CountryName varchar(50) NOT NULL
);
INSERT SupplierCountry
(SupplierName, CountryName)
VALUES
('Supplier A', 'USA'),
('Supplier A', 'France'),
('Supplier A', 'China');
SupplierCountry
╔══════════════╦═════════════╗
║ SupplierName ║ CountryName ║
╠══════════════╬═════════════╣
║ Supplier A ║ USA ║
║ Supplier A ║ France ║
║ Supplier A ║ China ║
╚══════════════╩═════════════╝
SupplierFactory
CREATE TABLE SupplierFactory
(
SupplierName varchar(50) NOT NULL,
CountryName varchar(50) NOT NULL
);
INSERT SupplierFactory
(SupplierName, CountryName)
VALUES
('Supplier A', 'UK'),
('Supplier A', 'Germany');
╔══════════════╦═════════════╗
║ SupplierName ║ CountryName ║
╠══════════════╬═════════════╣
║ Supplier A ║ UK ║
║ Supplier A ║ Germany ║
╚══════════════╩═════════════╝
Query:
SELECT
SC.SupplierName AS Supplier,
SC.CountryName AS SupplyCountry,
SF.CountryName AS FactoryCountry
FROM SupplierCountry AS SC
JOIN SupplierFactory AS SF
ON SF.SupplierName = SC.SupplierName;
Output:
╔════════════╦═══════════════╦════════════════╗
║ Supplier ║ SupplyCountry ║ FactoryCountry ║
╠════════════╬═══════════════╬════════════════╣
║ Supplier A ║ USA ║ UK ║
║ Supplier A ║ France ║ UK ║
║ Supplier A ║ China ║ UK ║
║ Supplier A ║ USA ║ Germany ║
║ Supplier A ║ France ║ Germany ║
║ Supplier A ║ China ║ Germany ║
╚════════════╩═══════════════╩════════════════╝
Best Answer
Try something like