SQLite – Strategy to Find Unique Bivariate Polynomials

sqliteunique-constraint

I have table that stores Tutte polynomials of a large set of graphs. These are bivariate polynomials with integer coefficients. An example of a polynomial taken from Wikipedia is:

enter image description here

Since the polynomial may be "sparse", I've saved the data for each graph in an SQLITE table as:

CREATE TABLE IF NOT EXISTS tutte_polynomial(
    graph_id   INTEGER,
    x_degree   UNSIGNED INTEGER,
    y_degree   UNSIGNED INTEGER,
    coeff      INTEGER
);

What I'd like to do now is determine the number of unique polynomials over the graph_id. When my database was small, I used a python script to store each polynomial as a tuple in a dictionary – this solution runs out of memory for larger databases.

Best Answer

The number of polynomials is the number of distinct graph_id values. (You probably have a separate table for graphs; in this case, the SELECT DISTINCT subqueries can be replaced with SELECT graph_id FROM graph.)

To count unique polynomials, we exclude any that are duplicates.

A polynomial is a duplicate if there exists any other polynomial with a smaller ID (the smallest ID would be the non-duplicate) and with the same coefficients.

Two polynomials have the same coefficients if there are not any differences in the possible x/y/coefficient combinations for both, i.e., for each row of one polynomial, the same row must exist for the other polynomial. In other words, there must not exist any row that does not have a match for the other polynomial.

And now that we have the description in the language of set theory, we can translate it directly into SQL. (I've used compound SELECTs for the innermost comparisons to avoid yet another level of negated subquery lookups.)

SELECT COUNT(*)
FROM (SELECT DISTINCT graph_id
      FROM tutte_polynomial) AS a
WHERE NOT EXISTS (SELECT 1
                  FROM (SELECT DISTINCT graph_id
                        FROM tutte_polynomial) AS b
                  WHERE b.graph_id < a.graph_id
                    AND NOT EXISTS (SELECT x_degree, y_degree, coeff
                                    FROM tutte_polynomial
                                    WHERE graph_id = a.graph_id
                                    EXCEPT
                                    SELECT x_degree, y_degree, coeff
                                    FROM tutte_polynomial
                                    WHERE graph_id = b.graph_id)
                    AND NOT EXISTS (SELECT x_degree, y_degree, coeff
                                    FROM tutte_polynomial
                                    WHERE graph_id = b.graph_id
                                    EXCEPT
                                    SELECT x_degree, y_degree, coeff
                                    FROM tutte_polynomial
                                    WHERE graph_id = a.graph_id))

If computing the count dynamically is too slow, you could try to create a temporary table that contains the data in a format that is easier to count:

CREATE TABLE polys_as_string AS
SELECT group_concat(data)
FROM (SELECT graph_id,
             x_degree || '|' || y_degree || '|' coeff AS data
      FROM tutte_polynomial
      ORDER BY graph_id, x_degree, y_degree)
GROUP BY graph_id;

CREATE INDEX polys_as_string_index on polys_as_string(data);

SELECT COUNT(DISTINCT data) FROM polys_as_string;
Related Question