This specific case (the exists
predicate) is a bit of an outlier as far as correlated subqueries go. ish...
Some correlated subqueries are certainly evaluated once for every row in the outer query, but I don't think you can apply that across the board, and I specifically think that's a statement that cannot be applied with authority to the exists
predicate.
The SQL example below this paragraph is highly contrived, but nevertheless illustrates an "evaluated once for each row in the outer query" correlated subquery. It's "highly contrived" because a straight relational join would be a much better solution, and in this simple case the join is super easy to write.
select
team_name
, (
select league_name
from leagues
where leagues.league_id = teams.league_id
) as league_name
from teams
In this instance, each row in the result set is going to contain the name of the league that the team belongs to, because of the correlated subquery. And the correlated subquery is probably going to be run once for each row in the outer query. Maybe.
The thing is, the database's query optimizer might rewrite this as a more efficient join, depending on the database. So even this simple example might not be executed "once per row in the outer query."
Either way, the subquery is dependent on the values returned from the table in the outer query, and that makes it a correlated subquery--because of the correlation with the outer table.
Part of the point of SQL in general is that the specific implementation is not important to you.
You should think in terms of sets of data (think set arithmetic), not in terms of iterations and comparisons per iteration.
The exists
predicate doesn't make you any promises about the particular implementation. In general, exists
is supposed to take advantage of table statistics and indexes in such a way that it produces a more efficient query plan than many other approaches will (and more efficient than you're likely to come up with on your own).
The actual query that ends up being run is going to be some kind of join, or multiple joins. But you don't need to get worked up about that. That's why your exists
clause contains select *
instead of identifying specific fields.
The query plan produced by the database engine will figure out which fields it wants to use, which key or keys and which index or indexes matter, etc.
So your exists
example is still a correlated subquery, despite the underlying implementation, because the inner query refers to and is logically dependent upon the table in the outer query.
SELECT DISTINCT store_type FROM stores
WHERE EXISTS
(
SELECT * FROM cities_stores
WHERE cities_stores.store_type = stores.store_type
);
The syntax for IN(...) is:
test_expression [ NOT ] IN
( subquery | expression [ ,...n ] )
With:
test_expression
Is any valid expression.
subquery
Is a subquery that has a result set of one column. This column must have the same data type as test_expression.
expression[ ,... n ]
Is a list of expressions to test for a match. All expressions must be of the same type as test_expression.
This means that only 1 column is allowed on both side. Even a list of expressions is considered as 1 dummy table with 1 column for expression values similar to:
SELECT exp FROM (values(exp1), (exp2), ...) as X(exp)
With 2 columns you can use a subquery with EXISTS:
Select VendorName, VendorCity, VendorState
FROM Vendors AS V1
WHERE NOT EXISTS (
SELECT 1
FROM Vendors AS V2
WHERE V2.VendorID <> V1.VendorID
AND VendorCity = VendorCity
AND VendorState = VendorState
)
Overall this subquery always return 0 row or 1 or more rows with 1 because it only needs to know if there is 1 or more rows with the same City and State.
The value is not important because it only looks at the number of row (=> if 1 or more rows EXISTS), hence the SELECT 1
. The test has already been done in the inner WHERE
clause.
I guess that VendorId is the PK. An index on VendorCity and VendorState would help.
Since you are learning and if you don't already know it, you can also look at the usage of APPLY (CROSS APPLY and OUTER APPLY). I let you try it and play with it.
Best Answer
Add an alias for the derived table before the
;
for example VendLargestUnpaidInv: