Sql-server – LEFT OUTER JOIN-ing the same table, grouping, summing and finding 0 or NULL

sql server

I have a table where I record searches performed on my site, tbl_search. An id is automatically assigned to entries. If the searched phrase is more than one word, my system splits the words and performs another search on each of them, setting a column – called original – to the original search id.

For instance:

  1. A user searches for 'foo bar'
  2. The system searches for 'foo bar', gets the id that was created when recording that search, let's say the id is 5.
    So now we have a record like this

     id | phrase   | results | original
    ----+----------+---------+----------
      5 | foo bar  | 0       | NULL
    
  3. The system searches for 'foo' and 'bar' by themselves, and records the original id. So now the table looks like this:

     id | phrase   | results | original
    ----+----------+---------+----------
      5 | foo bar  | 0       | NULL
      6 | foo      | 0       | 5
      7 | bar      | 0       | 5
    

Any phrase with an original value was not specifically searched for, and the results should be grouped, summed, and counted as results for the original search.

I want to know all the searched phrases that got ZERO results, and order them based on how many times they were searched. Here's my SQL:

 SELECT 
      s1.phrase, 
      COUNT(s1.phrase) as count, 
      SUM(s2.results) as sub_results
 FROM tbl_search s1
 LEFT OUTER JOIN tbl_search s2 ON s1.id = s2.original
 WHERE s1.results = 0
 GROUP BY s1.phrase, s2.phrase
 HAVING SUM(s2.results) = 0
 OR SUM(s2.results) IS NULL
 ORDER BY count desc

I'm very close I think, but I'm getting something like this:

 phrase  | count | results | sub_results
---------+-------+---------+------------
 foo bar | 14    | 0       | NULL
 map     | 5     | 0       | 4

I would expect any multiple-word searches to return 0 for sub_results, and any single word searches to return NULL, however, it seems like the opposite is happening. Like this:

 phrase  | count | sub_results
---------+-------+-------------
 foo bar | 14    | 0  // multiple-word search with no results, even in the aggregate sum of sub results
 map     | 5     | NULL // single-word searches have no sub searches, so they will be NULL

EDIT:
Sorry, StackExchange is being partially blocked by my company's firewall so I can't add any comments. @ypercube I don't want to count the sub searches (because they're not performed by the user) in the count. The count should only be how many times that phrase was searched by a user.

Basically I want all searches that got no results. If a search had sub searches with results, then I'm not worried about that because to the user, they got results.

Also, I don't want distinct id's because that would not be an accurate count. It should group the s1.phrases together and ignore the ids.

I made an SQLFiddle: http://sqlfiddle.com/#!3/a2913/2

Best Answer

I think you may want something like the following. Note that there was an error in the SQL Fiddle, so the first query corrects the data so that the desired results are returned!

-- Correct an error in the data
-- The sub-phrases "hub" and "cap" were mapped to the original phrase "map" rather than "hub cap"
UPDATE tbl_search
SET original = 13 /* "hub cap" */
WHERE id IN (14, 15) /* "hub" and "cap" */

-- All phrases that had no results and also had no results for any of the sub-phrases
SELECT s1.phrase, COUNT(*) AS searchAttempts
FROM tbl_search s1
LEFT OUTER JOIN (
    SELECT DISTINCT original 
    FROM tbl_search
    WHERE results > 0
        AND original IS NOT NULL
) subPhrasesWithMatch
    ON subPhrasesWithMatch.original = s1.id
WHERE s1.original IS NULL /* Only original searches */
    AND s1.results = 0 /* Only searches with no results */
    AND subPhrasesWithMatch.original IS NULL /* We didn't match the join to sub-phrases that returned results */
GROUP BY s1.phrase
ORDER BY searchAttempts DESC
--phrase    searchAttempts
--foo bar   3
--map       2
--foo       1