SQLite – Strategy to Find Unique Bivariate Polynomials

sqliteunique-constraint

I have table that stores Tutte polynomials of a large set of graphs. These are bivariate polynomials with integer coefficients. An example of a polynomial taken from Wikipedia is:

enter image description here

Since the polynomial may be "sparse", I've saved the data for each graph in an SQLITE table as:

CREATE TABLE IF NOT EXISTS tutte_polynomial(
    graph_id   INTEGER,
    x_degree   UNSIGNED INTEGER,
    y_degree   UNSIGNED INTEGER,
    coeff      INTEGER
);

What I'd like to do now is determine the number of unique polynomials over the graph_id. When my database was small, I used a python script to store each polynomial as a tuple in a dictionary – this solution runs out of memory for larger databases.

Best Answer

The number of polynomials is the number of distinct graph_id values. (You probably have a separate table for graphs; in this case, the SELECT DISTINCT subqueries can be replaced with SELECT graph_id FROM graph.)

To count unique polynomials, we exclude any that are duplicates.

A polynomial is a duplicate if there exists any other polynomial with a smaller ID (the smallest ID would be the non-duplicate) and with the same coefficients.

Two polynomials have the same coefficients if there are not any differences in the possible x/y/coefficient combinations for both, i.e., for each row of one polynomial, the same row must exist for the other polynomial. In other words, there must not exist any row that does not have a match for the other polynomial.

And now that we have the description in the language of set theory, we can translate it directly into SQL. (I've used compound SELECTs for the innermost comparisons to avoid yet another level of negated subquery lookups.)

SELECT COUNT(*)
FROM (SELECT DISTINCT graph_id
      FROM tutte_polynomial) AS a
WHERE NOT EXISTS (SELECT 1
                  FROM (SELECT DISTINCT graph_id
                        FROM tutte_polynomial) AS b
                  WHERE b.graph_id < a.graph_id
                    AND NOT EXISTS (SELECT x_degree, y_degree, coeff
                                    FROM tutte_polynomial
                                    WHERE graph_id = a.graph_id
                                    EXCEPT
                                    SELECT x_degree, y_degree, coeff
                                    FROM tutte_polynomial
                                    WHERE graph_id = b.graph_id)
                    AND NOT EXISTS (SELECT x_degree, y_degree, coeff
                                    FROM tutte_polynomial
                                    WHERE graph_id = b.graph_id
                                    EXCEPT
                                    SELECT x_degree, y_degree, coeff
                                    FROM tutte_polynomial
                                    WHERE graph_id = a.graph_id))

If computing the count dynamically is too slow, you could try to create a temporary table that contains the data in a format that is easier to count:

CREATE TABLE polys_as_string AS
SELECT group_concat(data)
FROM (SELECT graph_id,
             x_degree || '|' || y_degree || '|' coeff AS data
      FROM tutte_polynomial
      ORDER BY graph_id, x_degree, y_degree)
GROUP BY graph_id;

CREATE INDEX polys_as_string_index on polys_as_string(data);

SELECT COUNT(DISTINCT data) FROM polys_as_string;

Related Solutions

“standard way” to log network messages (events) into a database

It seems that the very same question has been asked (and answered) on stackoveflow:

https://stackoverflow.com/questions/2797592/best-practice-logging-events-general-and-changes-database

Mysql – Unique combination key MySQL

Here is before trigger you can use to guarantee that id1 and id2 are inserted in min/max order

DELIMITER $$
CREATE TRIGGER mytable_bi BEFORE INSERT ON mytable
FOR EACH ROW
BEGIN
    DECLARE sum INT;
    SET sum = NEW.id1 + NEW.id2;
    SET NEW.id1 = (NEW.id1 + NEW.id2 - ABS(NEW.id1 - NEW.id2)) / 2;
    SET NEW.id2 = sum - NEW.id1;
END $$
DELIMITER ;

Here is some Sample Data:

DROP DATABASE IF EXISTS ryanzec;
CREATE DATABASE ryanzec;
USE ryanzec
CREATE TABLE mytable
(
    id1 INT NOT NULL,
    id2 INT NOT NULL,
    mapTypeId INT NOT NULL,
    primary key (id1,id2)
);
DELIMITER $$
CREATE TRIGGER mytable_bi BEFORE INSERT ON mytable
FOR EACH ROW
BEGIN
    DECLARE sum INT;
    SET sum = NEW.id1 + NEW.id2;
    SET NEW.id1 = (NEW.id1 + NEW.id2 - ABS(NEW.id1 - NEW.id2)) / 2;
    SET NEW.id2 = sum - NEW.id1;
END $$
DELIMITER ;
INSERT INTO mytable VALUES (1,2,3);
INSERT INTO mytable VALUES (2,1,3);
INSERT INTO mytable VALUES (5,4,6);
INSERT INTO mytable VALUES (4,5,6);
SELECT * FROM mytable;

Here it is the Sample Data Loaded

mysql> DROP DATABASE IF EXISTS ryanzec;
Query OK, 1 row affected (0.03 sec)

mysql> CREATE DATABASE ryanzec;
Query OK, 1 row affected (0.00 sec)

mysql> USE ryanzec
Database changed
mysql> CREATE TABLE mytable
    -> (
    ->     id1 INT NOT NULL,
    ->     id2 INT NOT NULL,
    ->     mapTypeId INT NOT NULL,
    ->     primary key (id1,id2)
    -> );
Query OK, 0 rows affected (0.10 sec)

mysql> DELIMITER $$
mysql> CREATE TRIGGER mytable_bi BEFORE INSERT ON mytable
    -> FOR EACH ROW
    -> BEGIN
    ->     DECLARE sum INT;
    ->     SET sum = NEW.id1 + NEW.id2;
    ->     SET NEW.id1 = (NEW.id1 + NEW.id2 - ABS(NEW.id1 - NEW.id2)) / 2;
    ->     SET NEW.id2 = sum - NEW.id1;
    -> END $$
Query OK, 0 rows affected (0.10 sec)

mysql> DELIMITER ;

Watch What Happens When I Insert the Rows:

mysql> INSERT INTO mytable VALUES (1,2,3);
Query OK, 1 row affected (0.06 sec)

mysql> INSERT INTO mytable VALUES (2,1,3);
ERROR 1062 (23000): Duplicate entry '1-2' for key 'PRIMARY'
mysql> INSERT INTO mytable VALUES (5,4,6);
Query OK, 1 row affected (0.07 sec)

mysql> INSERT INTO mytable VALUES (4,5,6);
ERROR 1062 (23000): Duplicate entry '4-5' for key 'PRIMARY'
mysql> SELECT * FROM mytable;
+-----+-----+-----------+
| id1 | id2 | mapTypeId |
+-----+-----+-----------+
|   1 |   2 |         3 |
|   4 |   5 |         6 |
+-----+-----+-----------+
2 rows in set (0.00 sec)

mysql>

Now, no matter what order id1 and id2 are entered, id1 will always be less than or equal to id2

This may not be contextually what you want because id1 and id2 are unique identifiers.

Best Answer

Related Solutions

“standard way” to log network messages (events) into a database

Mysql – Unique combination key MySQL

Related Question