Select Records Without Associations Outside a Certain List in MySQL

MySQLselect

I have a mySQL table of nodes and each node can be associated with many categories. I have a list of category ids that are acceptable (498, 499). I want to select only nodes who do not have categories that are not in the list of acceptable category ids.

In this example, I should get Toby and Joseph back, but not Susan…

enter image description here

I was initially trying this, but the IN clause does not exclude other categories:

SELECT n.nid, n.title, category.dir_phys_category_tid
FROM node AS n
INNER JOIN field_data_dir_phys_category   AS category   ON n.nid = category.entity_id
WHERE n.type = 'dir_physicians'
 AND n.`status` = 1
 AND category.dir_phys_category_tid IN ( 498,499 )

Answer: modified from RolandoMySQLDBA's answer below, with formatting & naming changes

SELECT desired_categories.* 
FROM
    ( SELECT nid, title, COUNT(1) AS all_categories 
        FROM node 
        INNER JOIN field_data_dir_phys_category AS category ON node.nid = category.entity_id
        GROUP BY nid
    ) AS A 

INNER JOIN
    ( SELECT nid, COUNT(1) AS good_categories 
        FROM node
        INNER JOIN field_data_dir_phys_category AS category ON node.nid = category.entity_id
        WHERE category.dir_phys_category_tid IN (498,499) 
        GROUP BY nid
    ) AS B USING (nid)

INNER JOIN 
    ( SELECT DISTINCT nid, title 
        FROM node 
    ) AS desired_categories USING (nid)

WHERE all_categories = good_categories;

Best Answer

Here is the query

SELECT C.* FROM
(SELECT nid,COUNT(1) fullcount 
  FROM node 
  INNER JOIN field_data_dir_phys_category   AS category   ON node.nid = category.entity_id
  GROUP BY nid) A INNER JOIN
(SELECT nid,COUNT(1) goodcount 
  FROM node
  INNER JOIN field_data_dir_phys_category   AS category   ON node.nid = category.entity_id
  WHERE dir_phys_category_tid IN (498,499) GROUP BY nid) B USING (nid)
INNER JOIN (SELECT DISTINCT nid,title FROM node) C USING (nid)
WHERE fullcount=goodcount;

First, let's make some sample data. NOTE: this uses a single table. The OP used two tables - mine is intended to demonstrate the method used to count categories, which is the central part of my answer.

DROP DATABASE IF EXISTS doub1ejack;
CREATE DATABASE doub1ejack;
USE doub1ejack
CREATE TABLE node
(
    id INT NOT NULL AUTO_INCREMENT,
    nid INT NOT NULL,
    title VARCHAR(20),
    dir_phys_category_tid INT NOT NULL,
    PRIMARY KEY (id)
);
INSERT INTO node (nid,title,dir_phys_category_tid) VALUES
(1171,'Toby',499),(1172,'Susan',500),(1172,'Susan',499),
(1172,'Susan',498),(1173,'Joseph',499),(1173,'Joseph',498);
SELECT * FROM node;

Let's load it

mysql> DROP DATABASE IF EXISTS doub1ejack;
Query OK, 1 row affected (0.01 sec)

mysql> CREATE DATABASE doub1ejack;
Query OK, 1 row affected (0.00 sec)

mysql> USE doub1ejack
Database changed
mysql> CREATE TABLE node
    -> (
    ->     id INT NOT NULL AUTO_INCREMENT,
    ->     nid INT NOT NULL,
    ->     title VARCHAR(20),
    ->     dir_phys_category_tid INT NOT NULL,
    ->     PRIMARY KEY (id)
    -> );
Query OK, 0 rows affected (0.23 sec)

mysql> INSERT INTO node (nid,title,dir_phys_category_tid) VALUES
    -> (1171,'Toby',499),(1172,'Susan',500),(1172,'Susan',499),
    -> (1172,'Susan',498),(1173,'Joseph',499),(1173,'Joseph',498);
Query OK, 6 rows affected (0.00 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql>

Let's look at it

mysql> SELECT * FROM node;
+----+------+--------+-----------------------+
| id | nid  | title  | dir_phys_category_tid |
+----+------+--------+-----------------------+
|  1 | 1171 | Toby   |                   499 |
|  2 | 1172 | Susan  |                   500 |
|  3 | 1172 | Susan  |                   499 |
|  4 | 1172 | Susan  |                   498 |
|  5 | 1173 | Joseph |                   499 |
|  6 | 1173 | Joseph |                   498 |
+----+------+--------+-----------------------+
6 rows in set (0.00 sec)

mysql>

Next, let's run the proposed query

mysql> SELECT C.* FROM
    -> (SELECT nid,COUNT(1) fullcount FROM node GROUP BY nid) A
    -> INNER JOIN
    -> (SELECT nid,COUNT(1) goodcount FROM node
    -> WHERE dir_phys_category_tid IN (498,499)
    -> GROUP BY nid) B USING (nid)
    -> INNER JOIN
    -> (SELECT DISTINCT nid,title FROM node) C USING (nid)
    -> WHERE fullcount=goodcount;
+------+--------+
| nid  | title  |
+------+--------+
| 1171 | Toby   |
| 1173 | Joseph |
+------+--------+
2 rows in set (0.00 sec)

mysql>

OK why does it work ? Let's add the count columns, remove the WHERE and see

mysql> SELECT C.*,fullcount,goodcount FROM
    -> (SELECT nid,COUNT(1) fullcount FROM node GROUP BY nid) A
    -> INNER JOIN
    -> (SELECT nid,COUNT(1) goodcount FROM node
    -> WHERE dir_phys_category_tid IN (498,499)
    -> GROUP BY nid) B USING (nid)
    -> INNER JOIN
    -> (SELECT DISTINCT nid,title FROM node) C USING (nid)
    -> ;
+------+--------+-----------+-----------+
| nid  | title  | fullcount | goodcount |
+------+--------+-----------+-----------+
| 1171 | Toby   |         1 |         1 |
| 1172 | Susan  |         3 |         2 |
| 1173 | Joseph |         2 |         2 |
+------+--------+-----------+-----------+
3 rows in set (0.00 sec)

mysql>

Here is the correlation: When the number of categories (fullcount) is equal to the number of categories which are 498 and/or 499 (goodcount), then that row's nid is acceptable.

Give it a Try !!!