SQL Server – How Adding Where Clause Skews Data

sql serversql-server-2000t-sql

This is the result set that I get returned using my query:

empname itemssold
JJ      14

But this is my desired result set:

empname itemssold
AA      0
BB      0
CC      0
DD      0
JJ      14
RR      0

It seems to pulls all employees until I add in the where clause to my query which makes me wonder if I am adding it in the wrong place, or am I setting up my query incorrectly?

Below is DDL & My query I use:

Create Table #ShowAll
(
  empname varchar(1000)
  ,empid varchar(100)
)

Create Table #Data
(
  empid varchar(100)
  ,itemssold int
  ,beenverified varchar(100)
)

Insert Into #ShowAll Values
('JJ', 'J1'), 
('AA', 'A1'), 
('BB', 'B1'), 
('CC', 'C1'),
('DD', 'D1'), 
('RR', 'R1')

Insert Into #Data Values
('J1','14', 'Yes'), 
('A1', '12','No'), 
('B1', '13', 'No')

Select sa.empname
    , Da.itemssold
FROM #ShowAll sa
LEFT JOIN #Data da
    ON sa.empid = da.empid
WHERE da.beenverified = 'Yes'

Best Answer

You must move the test on beenverified from the WHERE clause to LEFT ... ON ...:

SELECT sa.empname
    , coalesce(Da.itemssold, 0)
FROM #ShowAll sa
LEFT JOIN #Data da
    ON sa.empid = da.empid 
    AND da.beenverified = 'Yes';

With a LEFT JOIN, when there is no match on empid between the two tables, this query (minus the WHERE clause) returns NULL for beenverified (and other columns from #Data):

empname itemssold   beenverified
JJ      14          Yes
AA      12          No
BB      13          No
CC      NULL        NULL
DD      NULL        NULL
RR      NULL        NULL

If beenverified is tested in the WHERE clause, it does not return rows where the condition is not met:

  • beenverified IS NULL
  • beenverified <> 'Yes' (i.e. = 'No')

Only row JJ is then valid. With the test in the WHERE clause, this query behaves like an INNER JOIN ....

Coalesce is then added in order to replace NULL by 0.