How Does a Subquery Use Main Query Columns

subquerywhere

I'm really confused by this SQL here, it's used in our app but I have no idea how it works. I put it in from examples I found online and it seems to have no issues.

What it's doing is grabbing the latest timestamp of a set. We couldn't use ORDER BY as there had to be multiple groupings. The actual query is much larger and this is just a cut down version showing the fundamental selection.

select * from table1 as main
WHERE NOT (EXISTS (select * from table1 sub
WHERE sub.datetime > main.datetime));

(Live online version: http://sqlfiddle.com/#!17/290cc/2)

What I think should be happening in my mind:

Two identical tables selected, the constraint in the subquery is saying:

Only select the rows from the subquery which have a date greater than any other row in the main query.

In this case:

  • Subquery id:1 has only a smaller than or equal to date value from all rows in the main query so it doesn't get selected.

  • Subquery id:2 has a date which is larger than id:1 from the main query selection, so id:2 gets selected once. It is not larger than anything else.

  • Subquery id:3 has a date which is larger than both id:1 and id:2 from the main query selection, so it gets selected twice.

So the return from the subquery should be (2, 3, 3). So the main query should do a selection which not exists in that set, which should return id:1, but it returns id:3.

Where is my misunderstanding?

Best Answer

Well, there are some misunderstandings about what the subquery is doing.

First of all, EXISTS works in this case by evaluating the subquery for every row of the main query, and returns true if the subquery returns even a single row.

Since you are using WHERE NOT EXISTS(....), what's happening for every row from the main query is the following:

  • For id = 1: The subquery evaluates if there exists a row in the same table, where the date is greater than the date for id = 1. And the results would be ids 2 and 3, hence EXISTS evaluates to true, and NOT EXISTS eliminates this row

  • For id = 2: Same as before, in this case the subquery would return id 3, hence EXISTS evaluates to true, and NOT EXISTS eliminates this row

  • For id = 3: There are no other rows in this table that have a date greater than it, so EXISTS evaluates to false, and NOT EXISTS returns true, so you get this row as a result