Soft delete field causes left join not to return the “left” side data

join;

I've got a parent table of trainings and a child table of trainingattendees.
That is a training can be attended by multiple attendees.
I have a bit field in my trainingattendee table called Deleted that has a default value of (0). That is the record is not deleted. This value can change to 1 if the record is deleted.

I was simply doing a left join from trainings to trainingattendees to get some data but if ALL my training attendees table rows are marked as deleted (1) I don't get my training record (the left side table)

That is this works:

SELECT
    Training.TrainingID AS TrainingID
FROM 
    Training 
LEFT OUTER JOIN
    TrainingAttendee ta
ON
    ta.TrainingID=Training.TrainingID   
WHERE 
    LOWER(Training.Title) = LOWER('Courtesy Dust Mask Training') 
    AND DATEADD(d, DATEDIFF(d, 0, Training.StartDate), 0) >= DATEADD(d, DATEDIFF(d, 0, GetDate()), 0)
    AND (ta.TrainingAttendeeStatusID=4)
GROUP BY
    Training.TrainingID

The issue here is this works because I am not filtering out my training attendee records for deleted ones. I have created a record in my training attendee table but marked it as deleted and reran this same query with one additional where condition.

SELECT
    Training.TrainingID AS TrainingID
FROM 
    Training 
LEFT OUTER JOIN
    TrainingAttendee ta
ON
    ta.TrainingID=Training.TrainingID   
WHERE 
    LOWER(Training.Title) = LOWER('Courtesy Dust Mask Training') 
    AND DATEADD(d, DATEDIFF(d, 0, Training.StartDate), 0) >= DATEADD(d, DATEDIFF(d, 0, GetDate()), 0)
    AND (ta.Deleted=0)
    AND (ta.TrainingAttendeeStatusID=4)
GROUP BY
    Training.TrainingID

By adding this: AND (ta.Deleted=0)

I then thought I could change this condition to this:

AND (ta.Deleted=0 OR ta.Deleted IS NULL)

But this too did not work…How can I properly do this such that I get my "left" table data ?

Best Answer

(Note: completely updated, when I realized your problems was a bit different than I initially thought).

As you probably realize (given the change you made), if you check for a non-NULL value on a LEFT JOIN column in your WHERE clause, it can make your LEFT JOIN act like an INNER JOIN.

There are two ways to handle this problem:

  1. You can perform all your checks in the JOINs ON clause
  2. You can check for the condition you're interested in within the WHERE clause, but OR that with a condition that is true when there are no rows found from the "right" table.

However, this can have two different effects.

When you check a condition in the ON clause, that becomes a part of the condition for joining the rows from the "right" table to the "left" table. So, if a row fails to meet your condition, it has no impact on the rows from the "left" table - it's as if it doesn't exist.

On the other hand, when you're checking in the WHERE clause, it is already tied to the data form the "left" table. So, if there are rows from the "right" table, but none of them match your condition, the rows from the "left" table do get excluded.

With that in mind, let's take a look at what you're trying to do. You want training classes where the trainees (if any) are in status 4, and you want to ignore any trainees marked as deleted.

So, try this instead:

SELECT
    Training.TrainingID AS TrainingID
FROM 
    Training 
LEFT OUTER JOIN
    TrainingAttendee ta
ON
    (ta.TrainingID=Training.TrainingID AND ta.Deleted = 0)
WHERE 
    LOWER(Training.Title) = LOWER('Courtesy Dust Mask Training') 
    AND DATEADD(d, DATEDIFF(d, 0, Training.StartDate), 0) >= DATEADD(d, DATEDIFF(d, 0, GetDate()), 0)
    AND (ta.TrainingAttendeeStatusID=4 OR ta.TrainingID IS NULL)
GROUP BY
    Training.TrainingID

This should treat any ta records marked as deleted as if they don't exist, but will only return Training records is they have no attendees, or if they have non-deleted attendees in the desired status.

NOTE: This does change your original query (which you said was working correctly), which was basically acting as an INNER JOIN. Try the new query with and without the OR ta.TrainingID IS NULL in the status check, to see if you really if the results are different, and which results you really wanted.

UPDATE: Further discussion with JonH revealed that he was trying to get a count of the number of seats in each class (where, logically enough, deleted attendees shouldn't be counted). He'd added the following to his query's SELECT list:

   ,ISNULL(COUNT(*),0) AS TakenSeats

However, he was still getting back 1 seat as taken, even if all attendees had been deleted. I explained that this was due to the fact that COUNT(*) was counting the number of rows returned - and even if no TrainingAttendee row was found, you'd still get one row from Training. I recommended changing this to:

   ,ISNULL(COUNT(ta.TrainingID),0) AS TakenSeats

which (since COUNT ignores NULL values) should return 0 when there are no TrainingAttendee rows found for a given Training row.