Mysql – How to filter table 2 before joining to table 1 so that the result still includes all of table 1, even if no match in table 2

conditionjoin;MySQLnull

The following returns all rows from table 1 where post_type = dlm_download. It groups like values from table 2 by the matching column in table 1 so that each row in table 1 has a maximum of one matching 'result' from table 2. The result is that all records from table 1 are displayed in all cases, simply displaying null for download_date when nothing in table 2 matches (meaning nobody has downloaded the file yet):

SELECT p.post_title AS 'Download title', p.ID AS 'Download ID', MAX(l.download_date) AS 'Last download'
FROM wp_posts p
LEFT JOIN wp_download_log l ON p.ID = l.download_id
WHERE p.post_type = 'dlm_download'
AND p.post_status = 'publish'
GROUP BY p.ID

I want to additionally filter out records from table 2 that match a few conditions…before being matched up with table 1. But the following additions to the query, placed in between WHERE and GROUP BY, ultimately whittle down the result from 70 rows to 59, eliminating the rows from table 1 even if nothing in table 2 matches (previously shown as NULL for the download_date)

AND l.user_agent NOT LIKE '%bot%'
AND l.user_agent NOT LIKE '%crawl%'
AND l.user_agent NOT LIKE '%panscient%'
AND l.user_agent NOT LIKE '%scrapy%'
AND l.user_agent NOT LIKE '%bubing%'
AND l.user_agent NOT LIKE 'Go-http-client/1.1'
AND l.user_agent NOT LIKE '%guzzle%'
AND l.user_agent NOT LIKE '%jetty%'
AND l.user_agent NOT LIKE '%resty%'

This is hard for me to explain or wrap my head around, but I think what is happening is the 11 items that have not yet been downloaded by humans are effectively excluded from the result by the above conditions. In other words, I think the conditions effectively say to not return anything that doesn't match these, and because NULL in table 2 doesn't match, it is eliminating results where there is something in table 1 but not table 2 (and thus NULL for table 2).

How can I return ALL records from table 1, wp_posts, including data from table 2 when a record matches (and NULL If not), but pre-filter table 2 by eliminating things where the user agent excludes bot etc?

Best Answer

If you want to reduce the number of rows you simply make a subselect and join that. See following query:

you should check the subquery if it returns all the wanted rows.

SELECT p.post_title AS 'Download title', p.ID AS 'Download ID', MAX(l.download_date) AS 'Last download'
FROM wp_posts p
LEFT JOIN (SELECT * 
           FROM wp_download_log 
           WHERE user_agent NOT LIKE '%bot%'
             AND user_agent NOT LIKE '%crawl%'
             AND user_agent NOT LIKE '%panscient%'
             AND user_agent NOT LIKE '%scrapy%'
             AND user_agent NOT LIKE '%bubing%'
             AND user_agent NOT LIKE 'Go-http-client/1.1'
             AND user_agent NOT LIKE '%guzzle%'
             AND user_agent NOT LIKE '%jetty%'
             AND user_agent NOT LIKE '%resty%') l ON p.ID = l.download_id
WHERE p.post_type = 'dlm_download'
AND p.post_status = 'publish'
GROUP BY p.ID