MySQL – Select Statement Ordered by Count of Matching Rows from Another Table

MySQL

I am trying to select the most relevant products from a table that match the currently viewed product based on the number of matching product categories.

Sample table A

-------------------------------------
| ID | Name | Desc | Price | Active |
-------------------------------------
| 1  | Car  | NR   | $1.00 | Y      |
-------------------------------------
| 2  | Phone| NR   | $2.00 | Y      |
-------------------------------------
| 3  | Pen  | NR   | $1.00 | Y      |
-------------------------------------
| 4  | Book | NR   | $2.00 | Y      |
-------------------------------------
| 5  | Light| NR   | $3.00 | Y      |
-------------------------------------
| 6  | Rock | NR   | $1.00 | Y      |
-------------------------------------
| 7  | Plum | NR   | $1.00 | N      |
-------------------------------------

Sample table B

---------------
| ID | Cat_ID |
---------------
| 1  | 2      |
---------------
| 1  | 3      |
---------------
| 2  | 1      |
---------------
| 2  | 2      |
---------------
| 2  | 5      |
---------------
| 3  | 3      |
---------------
| 4  | 2      |
---------------
| 4  | 3      |
---------------
| 4  | 6      |
---------------
| 5  | 3      |
---------------
| 6  | 2      |
---------------
| 6  | 3      |
---------------
| 6  | 4      |
---------------
| 7  | 2      |
---------------
| 7  | 3      |
---------------
| 7  | 4      |
---------------

When looking at product ID 1 I want to run a query to show the viewer products matching ID's 4 and 6 as other possible options since they share the most category_ID's with number 1 and they are active (Y). In reality, the products might share anywhere from 1 to 30 different categories. It is fine if there are 40 matches, I will use limit 2 to get only two of them.

The issue I am having is that I cannot figure out how to match both the active column from table A and the cat_ID column from Table B against the array of resulting ID's from the Table B query.

The result I want to receive from Table A:

--------------------------------
| 4  | Book | NR   | $2.00 | Y |
--------------------------------
| 6  | Rock | NR   | $1.00 | Y |
--------------------------------

Best Answer

as variant:

SET @ID = 1;
SELECT
    ID,
    count(*) as count_matched
FROM TableB
WHERE 
    ID != @ID
    AND
    CAT_ID IN (SELECT CAT_ID FROM TableB WHERE ID = @ID )
GROUP BY ID
ORDER BY count(*) DESC
LIMIT 2;

add after comment:

MySQL do not support LIMIT in subqueries, so You can:

  • Handle returned values in application and use it for prepared statements as parameters
  • Use temporary tables
SET @ID = 1;
DROP TEMPORARY TABLE IF EXISTS tt;
CREATE TEMPORARY TABLE tt as
SELECT
    ID,
    count(*) as count_matched
FROM TableB
WHERE 
    ID != @ID
    AND
    CAT_ID IN (SELECT CAT_ID FROM TableB WHERE ID = @ID )
GROUP BY ID
ORDER BY count(*) DESC
LIMIT 2;
SELECT    
    TableA.* FROM TableA WHERE     ID IN (SELECT ID FROM tt);

temporary table is session, so You could use it with same name independent for each session.

But if handle returned values in application - prevent table locks, because CREATE TABLE AS SELECT (and INSERT SELECT) - require lock table, with external code - it just 2 independent selects.