Accelerating a SQL quadruple self-join with a complex alternation in the WHERE clause

query-performanceself-joinsqlite

The following Sqlite query has a triple self-join on table t:

SELECT "update_variable",
    lhs_acc.name_suffix || ":" || rhs_var.name_suffix,
    op.span,
    op.path
FROM t op
JOIN t lhs_acc ON (lhs_acc.path GLOB op.path || "?*")
JOIN t rhs_acc ON (rhs_acc.path GLOB op.path || "?*")
JOIN t rhs_var ON (rhs_var.path GLOB op.path || "?*")
WHERE (op.name_prefix = "assignment"
    AND lhs_acc.name_prefix = "assignment_lhs_identifier"
    AND rhs_acc.name_prefix = "assignment_rhs_atom"
    AND rhs_var.name_prefix = "assignment_rhs_atom"
    AND rhs_acc.name_suffix != rhs_var.name_suffix
    AND lhs_acc.name_suffix = rhs_acc.name_suffix)
OR (op.name_prefix = "augmented_assignment"
    AND lhs_acc.name_prefix = "assignment_lhs_identifier"
    AND rhs_acc.name_prefix = "assignment_rhs_atom"
    AND rhs_var.name_prefix = "assignment_rhs_atom")
OR (op.name_prefix = "method_call"
    AND rhs_acc.name_prefix = "method_call_name"
    AND lhs_acc.name_prefix = "method_call_object"
    AND rhs_var.name_prefix = "call_argument"
    AND rhs_acc.name_suffix != rhs_var.name_suffix
    AND rhs_acc.name_suffix REGEXP "(append|extend|insert|add|update)$")
GROUP BY op.span,
        lhs_acc.name,
        rhs_var.name

It is far too slow for my application. I can accelerate it enough by restricting each instance of t to the lines I am interested in, which somehow duplicates the WHERE conditions. So, the FROM clause becomes:

FROM
(SELECT *
FROM t
WHERE t.name_prefix IN ("assignment",
                        "augmented_assignment",
                        "method_call")) op
JOIN
(SELECT *
FROM t
WHERE t.name_prefix IN ("assignment_lhs_identifier",
                        "method_call_object") ) lhs_acc ON (lhs_acc.path GLOB op.path || "?*")
JOIN
(SELECT *
FROM t
WHERE t.name_prefix IN ("assignment_rhs_atom",
                        "method_call_name") ) rhs_acc ON (rhs_acc.path GLOB op.path || "?*")
JOIN
(SELECT *
FROM t
WHERE t.name_prefix IN ("assignment_rhs_atom",
                        "call_argument") ) rhs_var ON (rhs_var.path GLOB op.path || "?*")

Is there some technique which would lead to a less verbose / ugly / redundant query ?


Edit: The test duplication (but not the ugliness) can be avoided as follows:

SELECT "update_variable",
    lhs_acc.name_suffix || ":" || rhs_var.name_suffix,
    op.span,
    op.path
FROM
(SELECT *
FROM t
WHERE t.name_prefix IN ("assignment",
                        "augmented_assignment",
                        "method_call")) op
JOIN t lhs_acc ON (lhs_acc.name_prefix = (CASE op.name_prefix
                                            WHEN "method_call" THEN "method_call_object"
                                            ELSE "assignment_lhs_identifier"
                                        END)
                AND (lhs_acc.path GLOB op.path || "?*"))
JOIN t rhs_acc ON (rhs_acc.name_prefix = (CASE op.name_prefix
                                            WHEN "method_call" THEN "method_call_name"
                                            ELSE "assignment_rhs_atom"
                                        END)
                AND (rhs_acc.path GLOB op.path || "?*"))
JOIN t rhs_var ON (rhs_var.name_prefix = (CASE op.name_prefix
                                            WHEN "method_call" THEN "call_argument"
                                            ELSE "assignment_rhs_atom"
                                        END)
                AND (rhs_var.path GLOB op.path || "?*"))
WHERE op.name_prefix = "augmented_assignment"
OR (op.name_prefix = "assignment"
    AND lhs_acc.name_suffix = rhs_acc.name_suffix
    AND rhs_acc.name_suffix != rhs_var.name_suffix)
OR (op.name_prefix = "method_call"
    AND rhs_acc.name_suffix REGEXP "append|extend|insert|add|update)$"
    AND rhs_acc.name_suffix != rhs_var.name_suffix)
GROUP BY op.span,
        lhs_acc.name,
        rhs_var.name

Best Answer

In case someone is interested, I answer my own question: since I reuse a large number of this filtered tables in several queries, I ended up precalculating them. This address both the readability and performance issues.